What is the definitive way of only retaining 7 days of logs

Interesting, so you would but the curator rule in cron and just run it daily?

Yes.

What about the TTL stuff? is it better to avoid it and stick to default template?

Oh, yes.

TTLs at first seem like a good idea. "Oh! I can just set this up and it will auto-prune when it hits the pre-defined TTL." The reality is that while this works, it is a Really Bad Idea™ with time-series data, where you know it will always expire in a predictable way.

TTLs force Elasticsearch to check every single document, every 60 seconds (an editable default, but the principle remains). If I have 1,000,000,000 records per day, then I have as many as 1,000,000,000 documents TTLs being checked every 60 seconds, with a 1 day TTL. You can imagine the strain that puts on the disk subsystem, not to mention the hit it would be to queries. On top of this, a TTL-deleted document is not immediately deleted. It is marked for deletion (yep, another I/O operation), and then the delete happens at the next segment merge. Segment merges will, of necessity, be very frequent because of TTLs, which adds to the disk I/O strain. Even if I configure the TTL check to be less frequent (hourly, or even daily), I will still have 1,000,000,000 "mark for deletion" operations, followed immediately by a kajillion segment merges. Oh, and you don't get to choose when the first TTL check happens, so it could be during high use times.

On the other hand, deleting an entire index at once with the index delete API (which is what Curator uses), eliminates every document in a few seconds (because it deletes at the index level), with no more segment merges or disk I/O pain than that.

If you were to compare these two models to SQL commands, the first (TTLs) would be like:

DELETE FROM TABLE WHERE TIMESTAMP < now-24h;

and the second model would be like:

DROP TABLE TABLENAME;

You can see that the first is going to be millions of atomic operations, while the second just drops the entire table. That's what deleting an index vs. TTLs is like, and why TTLs are a Really Bad Idea™ for time-series data.

3 Likes