I have implemented an ILM (Index Lifecycle Management) policy with a time-based strategy, where indices such as my-data-index-api-dev-*
are automatically deleted after 7 days. Currently, this setup deletes indices after the specified time period, and new indices are created with the same naming pattern. However, I would like to update the policy to delete indices based on storage volume instead of time, while ensuring the indices continue to follow the same naming pattern and strategy. How can I configure the policy to achieve this?
I don't think this is possible, the delete phase is always based on the age of the index, not on the volume.
What you can do is change your rollover in the hot phase to be based on volume, but then your indices could be rolled before or after 7 days, depending on the volume.
If a single index becomes full in Elasticsearch, it can lead to data loss as Elasticsearch may drop incoming data due to insufficient space. How can we create an ILM (Index Lifecycle Management) policy to prevent this and manage the situation effectively?
You risk losing data if one or more nodes reach the flood watermark stage, which per default is 95% of usage on the data disk.
When this happens, Elasticsearch marks all indices that have at least one shard on the node as read-only and this can lead to data loss in some cases.
To avoid this scenario you need to monitor your nodes and take some actions before this happens.
ILM is used to help you manage the size of your indices, but it does not take any action to prevent a node of reaching the flood watermark, it will only rollover your indices and delete them according to the configured policies.
What you can do is adjust your policies to rollover or delete your indices earlier, so your nodes will not reach the flood watermark stage.
What is your indice average size? What is your cluster available space? Is your Lifecycle policy only based on age or it is also based on size?
As mentioned, you can only delete based on age, but you can rollover based on size.
I have implemented a policy to delete indices after 7 days, and it is working as expected. However, if there’s a scenario where a significant number of hits cause the index to fill up, I have also created a policy based on volume to address this issue.
Currently, my index follows the naming pattern my-data-index-api-dev-, which includes indices for specific dates (e.g., my-data-index-api-dev-2024.11.28). After applying the policy, will the index remain under the same pattern (my-data-index-api-dev-), or will a new index be created? If a new index is created, how can I ensure that logs continue to be ingested into it seamlessly? Is there a solution to handle this situation effectively?
You can only have one policy attached to an index, so it is not clear what is this another policy you are mentioning as you also cannot delete by volume, can you provide more context on what you mean here?
I would recommend that you check the rollover documentation to understand more what is a rollover and how this works.
Basically when you use rollovers in an ILM policy, you use an alias for your indices and elasticsearch manages the backing indices, you can have a date pattern in the name of the backing indices, but they will match the date of the rollover, you will not have daily indices anymore.
You can use an alias named my-data-index-api-dev
and your backing indices would be something like my-data-index-api-dev-2024.12.04-000001
, where the 000001
will increment every time the index rollover.
When you use rollovers, you need to send the indexing requests to the alias, and elasticsearch will know which is the current writing index for that alias.
For example, if you use logstash, your index
option should be index => your-index-alias
and not index => your-index-name-%{+YYYY.mm.dd}
as used on daily indices.
I would say that the solution is to move away from daily index and use rollover by size to let Elasticsearch create the indices when needed.
I believe Curator has the ability to delete indices based on total size on disk like you are describing. You can use this instead of ILM for these indices, but be aware that it runs outside Elasticsearch and has to be invoked periodically, e.g. through a cron job.
During a load testing session, we encountered a situation where logs and traces were not visible on the Kibana UI, and the Stack Management page was inaccessible. To resolve this issue, we accessed the VM and manually deleted some indices to free up space, which restored the Kibana UI functionality. To prevent such scenarios in production, where data loss is unacceptable, we need to implement proactive measures such as monitoring disk usage, setting up index lifecycle policies (ILM) for automated index management, and ensuring sufficient storage capacity to handle load spikes.
Could you help us to overcome this situation or can you give us some solution for this.