Hello Master Elasticsearch Gurus,
I am new to ELK and last year implemented a simple Elasticsearch server (ver 7.4.0, I know, I have to upgrade) that is archiving some simple production user data. When I set up ES, I thought it would only be a temporary thing, so I did a “bare bones” implementation. But now, ES has proven its worthiness, and my boss would like the server to collect data for a few months.
This is a problem, because my only index is growing at an unsustainable rate. Sooner or later, I’ll have too many documents and, well, I don’t want to think about what happens next:
[root@Linux elasticsearch]# curl -X GET "localhost:9200/_cat/indices?v&pretty"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open myIndex 12345-A-12345abcde1234 1 1 1227840 0 284.5mb 284.5mb
[root@Linux elasticsearch]#
See? 1,227,840 documents. This index is about to blow up.
The thing is, I really don’t need those documents after, say, a week. Once a document is over seven days old, it can be permanently deleted. The index should live forever, but I need to automate a way to clean out old data.
Being an ES newbie, I’ve read through the online documentation, and I think what I want is a rollup job. (An Index Lifecycle Management policy seems like the wrong approach – I don’t want to ever phase out the index.) The catch with a rollup job is, I don’t really want to roll up any data. After X amount of time, I want to clean out the old data, never to be seen or summarized again. I don’t need to inspect the old data before it gets thrown out; I just need it gone.
Working through the Rollup Jobs section on ES’s online documentation, I think what I need to do is this:
curl -X PUT "localhost:9200/_rollup/job/myCleanUp?pretty" -H 'Content-Type: application/json' -d'
{
"index_pattern": "myIndex",
"rollup_index": "willNeverUse",
"cron": "*/30 * * * * ?",
"page_size" :1000,
"groups" : {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "1h",
"delay": "7d"
},
},
}
In other words: Every thirty minutes, check through index myIndex. If you see any documents older than seven days, roll them up into a rollup index named willNeverUse But don’t actually preserve or summarize any data.
On paper, this looks correct. I don’t dare try to implement it as is because my ES is technically in production.
But this solution is kind of silly, right? I am rolling up a lot of nonexistent data. And while myIndex will remain small and manageable, willNeverUse will continue to grow and grow with nothing but date histogram metadata. Sooner or later, that rollup_index will balloon to an unmanageable size. I’m just kicking the can down the road.
Isn’t there a more direct approach? Can’t I just configure ES to delete all documents in myIndex that are older than 7 days? Thank you.