I was looking at a solution to delete old logs and I was suggested to set up ILM.
I only now realize that ILM rolls over the entire index rather than old entries. If I understand correctly if I set up ILM to 3 months and then move to deletion, after 3 months it would delete the entire index.
Is there a way to set it so after 3 months, only entires that are older than 3 months will be deleted? Is it possible to do so for storange as well (once reaches 250GB delete old entires)?
You'd need to create a custom process to handle that with a delete-by-query.
It's highly inefficient though, and you'd be better off using ILM and deleting entire indices instead.
Best practice is to use time-based indices where data gets assigned to indices based on when they occur. The oldest indices therefore hold the oldest data and indices are deleted oldest first when all data in them has exceeded the retention period.
Because as of now, I'm not creating time based indices. all the output goes to one single indice. This means that if it gets 3 months old it would be deleted, even though it recieved a document 5 minutes ago.
Sorry if I'm not clear, this is pretty new to me. Please tell me if I could phrase it better.
No worries, thanks for explaining your current approach!
Have a read of this page of the docs, it gives a practical example of how this would work, the overall approach is how it'd work with Logstash as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.