Set a _ttl value at the index level?


(Casie Owen) #1

In elasticsearch,_ttl value can be set for documents or types, but can _ttl value be set at the index level? Either for individual indices as they're created or globally for all indices?

Asking because we want to implement a retention policy that deletes indices, not just documents.

Thanks in advance! Casie


(Mark Walkom) #2

It cannot, use a template if you need to do this.


(Casie Owen) #3

Thanks for the quick reply! So, if I set a _ttl value in an index template and then use that template to create an index, the _ttl value would apply to the index itself. So, if the _ttl value in the template was 1 day, the index itself would be deleted after one day (assuming the indices.ttl.interval is appropriate)?

If so, in terms of the syntax, is the below correct:

curl -XPUT localhost:9200/_template/template_1 -d '
{
"template" : "*",
"order" : 0,
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"_source" : { "enabled" : false }
"_ttl" : { "enabled" : true, "default" : "1d" }
}
}
}

Also, is there any documentation about setting _ttl in an index creation template?


(Aaron Mildenstein) #4

As @warkolm stated, there is no index-level TTL presently. There's a feature request issue out for it (I forget the number), but it has not yet been added.

If you need to delete entire indices, I recommend using something like Elasticsearch Curator, rather than using at TTL to cover all documents. There are good reasons for this. Quoted from this post:

TTLs at first seem like a good idea. "Oh! I can just set this up and it will auto-prune when it hits the pre-defined TTL." The reality is that while this works, it is a Really Bad Idea™ with time-series data, where you know it will always expire in a predictable way.

TTLs force Elasticsearch to check every single document, every 60 seconds (an editable default, but the principle remains). If I have 1,000,000,000 records per day, then I have as many as 1,000,000,000 documents TTLs being checked every 60 seconds, with a 1 day TTL. You can imagine the strain that puts on the disk subsystem, not to mention the hit it would be to queries. On top of this, a TTL-deleted document is not immediately deleted. It is marked for deletion (yep, another I/O operation), and then the delete happens at the next segment merge. Segment merges will, of necessity, be very frequent because of TTLs, which adds to the disk I/O strain. Even if I configure the TTL check to be less frequent (hourly, or even daily), I will still have 1,000,000,000 "mark for deletion" operations, followed immediately by a kajillion segment merges. Oh, and you don't get to choose when the first TTL check happens, so it could be during high use times.

On the other hand, deleting an entire index at once with the index delete API (which is what Curator uses), eliminates every document in a few seconds (because it deletes at the index level), with no more segment merges or disk I/O pain than that.

If you were to compare these two models to SQL commands, the first (TTLs) would be like:

DELETE FROM TABLE WHERE TIMESTAMP < now-24h;
and the second model would be like:

DROP TABLE TABLENAME;
You can see that the first is going to be millions of atomic operations, while the second just drops the entire table. That's what deleting an index vs. TTLs is like, and why TTLs are a Really Bad Idea™ for time-series data.


(Casie Owen) #5

Hi!

Curator is helping for now.

Do you know apx when the feature request for setting retention policies on indices will be available?

Thanks,
Casie


(Aaron Mildenstein) #6

I do not.


(Casie Owen) #7

Any idea how I would track that feature?


(system) #8