In our setup we have been using curator since a long time, but recent features (namely cold nodes with searchable snapshots), are not supported by it. I want to point out the issues we are having with ILM and the workaround approaches we tried out.
the general problem
Unlike curator, ILM only handles indexes individually. If you want ILM to delete the indexes with your oldest logs, for example, you can just do that based on the age of the index. There is no way that this prevents your cluster from running out of space when your log volume increases unexpectedly.
With that, you are either forced to overprovision your cluster (increasing your permanant costs for resources that are unused most of the time), or by getting someone woken up in the night whenever your cluster is running full (who will most likely just delete the oldest logs manually then!)
issues addressing this problem have unfortunately no progress since almost 2 years:
- https://github.com/elastic/elasticsearch/issues/49392
- https://github.com/elastic/elasticsearch/issues/44001
what curator can do (better)
you can still use curator if you just need to perform basic actions like:
- rollover when indexes get too big
- allocate to different nodes (hot to warm)*
- delete old indexes*
*unlike ILM, you can do these actions using space filters based on the disk usage in your cluster. This can reliably ensure that your cluster doesn't run out of disk space, independent of the ingest volume of logs!
what curator cannot do
Unfortunately, development on curator has basically stopped. Because of that, new features of elasticsearch are only supported by ILM.
A recent feature where this is the case are searchable snapshots, used on cold data nodes.
The action would require multiple steps to be executed, including mounting of the snapshot, which curator cannot do.
using curator together with ILM
Since curator can update index settings, you can actually use it to apply a certain ILM policy to an index. But that introduces a few problems:
The approach sound like it would solve all the issues discussed, by just creating multiple ILM policies and using curator to switch between them. Unfortunately, changing the lifecycle policy of an index has no effect by design: The index won't leave the phase of its old policy until that ends, and that ending condition can again just be time based - or it never ends in case it is the final phase of the policy.
Another thing I tried, to let curator move an index to the next phase of an ILM policy, is updating its index.lifecycle.origination_date, which ILM should use to determine the index age. But that is also not useable as a workaround.
I'd be happy for any feedback and other ideas that may work better than the approaches we tried so far!