Elasticsearch清理历史索引笔记

前言

es用作海量数据和存储和检索，对于较大日志数据，如果全部存储在一个索引中，
像数据库表一样，久而久之，检索查询会越来越慢，服务器压力也大。无论是按天分索引、还是按月份索引，一般建立索引的时候都带上时间。
很明确的说，已经超过一定期限的索引，应该是不会再写入新数据了，可以强制进行合并压缩，设置只读等。
对于搜索来说，合并的分片越少，查询和排序越快（想象一下深度分页。。。）
我们的文档存储在分片中，并且在分片中被索引，但是我们的应用程序不会直接与它们通信，取而代之的是，直接与索引通信，像查询数据库表一样，我们可能只需要查询近两天的数据就能得到我们想要的结果，而不是扫描包含所有数据的大表。
将数据按时间分索引存储，将所有数据全load到内存，查询不走磁盘，直接走内存是很快的。

过期数据清理策略

es提供对外删除索引的api,我们可以通过写shell脚本的方式来删除过期的数据。
新建es-index-delete.sh脚本

#!/bin/bash
ESURL=http://your.es.ip:9200

# 传入索引名，删除索引
function delete_es_index(){
  printf "deleting index: $1 \r\n"
  curl -u elastic:tianyan -XDELETE "$ESURL/$1" > /dev/null 2>&1
}

#循环es查询索引api返回结果,处理每个索引
for index in $(curl -u elastic:tianyan -XGET "$ESURL/_cat/indices?v&h=i"|  awk '{print $1}'|uniq);
do
    #因为最后时间结尾可能是YYYY-mm-dd 或YYYY.mm.dd
	day=${index: -2}
	month=${index:0-5:2}
	year=${index:0-10:4}
	currentIndexDate=`date -d "$year-$month-$day" +%Y-%m-%d` || echo "$indeix"
	currentIndexDateMs=`date -d "$currentIndexDate" +%s`
        
    #获取7天前时间
	last7Day=`date -d "-07 days" +%Y-%m-%d`
	last7Dayms=`date -d "$last7Day" +%s`
	
	if [ $currentIndexDateMs -lt $last7Dayms ];then
		delete_es_index $index
	fi
done

定时脚本执行

每天 6点10分删除

crontab -e
10 6 * * * sh /tmp/es-index-delete.sh > /dev/null 2>&1

其他方式

官方提供插件：elasticsearch-curator
- 地址：https://github.com/elastic/curator
使用java代码，spring的@Scheduled或是quartz框架实现。

独行侠梦

发布了106 篇原创文章 · 获赞 21 · 访问量 19万+

私信关注

Elasticsearch清理历史索引笔记

前言

过期数据清理策略

定时脚本执行

其他方式

猜你喜欢