Set a _ttl value at the index level?

As @warkolm stated, there is no index-level TTL presently. There's a feature request issue out for it (I forget the number), but it has not yet been added.

If you need to delete entire indices, I recommend using something like Elasticsearch Curator, rather than using at TTL to cover all documents. There are good reasons for this. Quoted from this post:

TTLs at first seem like a good idea. "Oh! I can just set this up and it will auto-prune when it hits the pre-defined TTL." The reality is that while this works, it is a Really Bad Idea™ with time-series data, where you know it will always expire in a predictable way.

TTLs force Elasticsearch to check every single document, every 60 seconds (an editable default, but the principle remains). If I have 1,000,000,000 records per day, then I have as many as 1,000,000,000 documents TTLs being checked every 60 seconds, with a 1 day TTL. You can imagine the strain that puts on the disk subsystem, not to mention the hit it would be to queries. On top of this, a TTL-deleted document is not immediately deleted. It is marked for deletion (yep, another I/O operation), and then the delete happens at the next segment merge. Segment merges will, of necessity, be very frequent because of TTLs, which adds to the disk I/O strain. Even if I configure the TTL check to be less frequent (hourly, or even daily), I will still have 1,000,000,000 "mark for deletion" operations, followed immediately by a kajillion segment merges. Oh, and you don't get to choose when the first TTL check happens, so it could be during high use times.

On the other hand, deleting an entire index at once with the index delete API (which is what Curator uses), eliminates every document in a few seconds (because it deletes at the index level), with no more segment merges or disk I/O pain than that.

If you were to compare these two models to SQL commands, the first (TTLs) would be like:

DELETE FROM TABLE WHERE TIMESTAMP < now-24h;
and the second model would be like:

DROP TABLE TABLENAME;
You can see that the first is going to be millions of atomic operations, while the second just drops the entire table. That's what deleting an index vs. TTLs is like, and why TTLs are a Really Bad Idea™ for time-series data.

2 Likes