is it possibly to create an index policy that will continuously delete index data that is older than a certain time frame.
Something like the TTL which was deprecated but enabled individual documents to be deleted based on their age instead of deleting the whole index?
No. ILM deletes complete indices only.
So there is no possible way to handle this with built in ELK features other then manually building chron jobs which delete old data from indices through the API?
Yes, I believe so.
Thank you very much for the answers!
Is it better to use the API and delete old data inside of each index or separate each index into a time-based index so that it can be just dropped (daily, weekly...)?
What is the trade off in terms of the query performance when deleting the data (drop vs delete) and the trade off of having much more indices (considering the starting number of indices is in the thousands) to work with and query because they are time based?
When running delete by query each document need to be individually deleted, which will result in merging and is a lot less efficient compared to deleting a complete index. Time based indices do however generally assume that you are indexing immutable data as updating can be inefficient.
Why would you need thousands of indices? Often your indices can be more corse grained than your query window and you adjust for this by filtering on the query window.
Yes, the data is immutable.
The requirement for the indices comes because of the need to support multi tenancy with each tenant being able to support numerous filebeat/metricbeat modules sources on multiple hosts (#tenants X #hosts X #metric/filebeat modules). Taking this into account and separating each source into an individual index the total number gets quite high even without using time based indices.
Consolidate indices as far as you can as having lots of small indices and shards can be very inefficient and cause performance problems.
Thank you again very much for the answers!
One more question regarding the number of indices. Is there a certain thresholds when the number of indices starts to seriously hinder performance or it depends on the data and the cluster?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.