I'm about to transition our log collection fleet to using ILM. Looks promising, but I have a question.
Currently we use time (by day) based indices. Filebeat sends to localhost logstash, and logstash bulk indexes to Elasticsearch. With this setup is relatively easy to know which index a log will appear in - it's based on "@timestamp".
However with ILM the new index is rolled after a given size or time period. It also looks like older indices can be marked as read only, shrunk, and merged. This is a great feature.
However, what happens when a specific node hypothetically loses network for 12 hours. During this time, a new index is rolled. When the network comes online, filebeat/logstash will catch up and index the older logs. Will these older logs appear in the new index? If so, it seems that deleting old logs by index does not guarantee that logs will be deleted by @timestamp order anymore. This isn't particularly an issue, but just wondering.
Another question - how does logstash discover the new index name to send index requests to? Does it just index to a pattern and ES handles the correct index?
So if the node loses network for 12 hours during which rollover happens, when it comes back online, it will index into the most recent index. This is because indexing happens through an alias. You would have to ensure that when deleting indices you didn't delete older data.
Another question - how does logstash discover the new index name to send index requests to? Does it just index to a pattern and ES handles the correct index?
It uses an alias marked with the "write index" when rollover is set up, all indexing requests go through this alias, which is updated when rollover occurs.
Hopefully that clarifies, thanks for taking a look at ILM!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.