ILM and indexing older logs after a period of downtime

Hi,

I'm about to transition our log collection fleet to using ILM. Looks promising, but I have a question.

Currently we use time (by day) based indices. Filebeat sends to localhost logstash, and logstash bulk indexes to Elasticsearch. With this setup is relatively easy to know which index a log will appear in - it's based on "@timestamp".

However with ILM the new index is rolled after a given size or time period. It also looks like older indices can be marked as read only, shrunk, and merged. This is a great feature.

However, what happens when a specific node hypothetically loses network for 12 hours. During this time, a new index is rolled. When the network comes online, filebeat/logstash will catch up and index the older logs. Will these older logs appear in the new index? If so, it seems that deleting old logs by index does not guarantee that logs will be deleted by @timestamp order anymore. This isn't particularly an issue, but just wondering.

Another question - how does logstash discover the new index name to send index requests to? Does it just index to a pattern and ES handles the correct index?

Thanks,
Justin

So if the node loses network for 12 hours during which rollover happens, when it comes back online, it will index into the most recent index. This is because indexing happens through an alias. You would have to ensure that when deleting indices you didn't delete older data.

Another question - how does logstash discover the new index name to send index requests to? Does it just index to a pattern and ES handles the correct index?

It uses an alias marked with the "write index" when rollover is set up, all indexing requests go through this alias, which is updated when rollover occurs.

Hopefully that clarifies, thanks for taking a look at ILM!

That makes sense, thanks for clarifying @dakrone!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.