Hello,
I'm currently trying to delete indices which are either greater than 50 GB or older than 31 days, but it seems like this is not possible with ILM.
ILM uses, if anything defined in the hot stage, the days since rollover to check if the indices should be deleted.
If nothing is defined in the hot stage, it uses the creation date.
Is there any chance I can make ILM check either greater than 50 GB or creation date > 31 days?
Let's say I use the recommended settings which are "Roll over when an index is 30 days old or any primary shard reaches 50 gigabytes". This means, the index is moved into delete phase either after 30 days old or reaching 50 gb.
But can I configure ILM, so it deletes the data after 30 days, no matter if it rolled over or not? So:
index1 is 74 gb, gets rolled over after 17 days, 2nd shard is 24 gb. Both get deleted after 30 days.
index2 is 20gb, doesn't get rolled over, gets deleted after 30 days.
When using ILM the goal is generally to keep at least X days worth of data in the cluster. As ILM deletes complete indices it is not exact but it is considered acceptable to at times hold a bit more than X days rather than less. The newest data in an index roughly correspinds to the rollover time, which is why this is used in the calculation.
so I can either live with the fact that data could be stored up to 62 days (31 days til rollover + 31 days delete phase) or create a 2nd ILM rule and try add the specific indices to that, right?
I definitely need 31 days delete phase cause my data needs to be stored for 31 days minimum.
Adjust the rollover period relative to how much extra data you are willing to hold in the cluster. If you rollover at 7 days you will hold at most 7 days extra data. If you however have a very long retention period, 31 days worth of data might be appropriate as it is only small percentage.
Well, I need to hold data for 31 days before deleting. I got two scenarios:
indices which get larger than 50 gb
indices which do not
Because I need the data for 31 days, I have to set the delete phase to 31 days because if I set it any lower and the rollover gets triggered after like a day because the index is bigger than 50 gb already, the data gets deleted to early.
Which is why I thought about doing two ILM rules with:
hot phase rollover 50 gb, deleting after 31 days (for indices that get bigger than 50 gb)
hot phase rollover 30 days, deleting after 1 day (for indices smaller than 50 gb)
This does not make sense to me. I do not understand hat you are trying to achieve.
Typically you specify max age as well as max shard/index size for the rollover. If you get a lot of data it rolls over based on index/shard size and if the data volume is low it rolls over based on age, as this is reached before the size limit. This will give you indices of different size covering different time periods. If you are required to keep at least 31 days of data you set the delete phase to this, and it will be based on the rollover timestamp. This way you will keep at most one index extra at any time, and the max size in terms of age and size is given by your rollover settings.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.