Continuous index policy targeting specific data

nickgregz · August 3, 2020, 12:23pm

Hello,
is it possibly to create an index policy that will continuously delete index data that is older than a certain time frame.
Something like the TTL which was deprecated but enabled individual documents to be deleted based on their age instead of deleting the whole index?

Christian_Dahlqvist · August 3, 2020, 12:25pm

No. ILM deletes complete indices only.

nickgregz · August 3, 2020, 12:40pm

So there is no possible way to handle this with built in ELK features other then manually building chron jobs which delete old data from indices through the API?

Christian_Dahlqvist · August 3, 2020, 12:53pm

Yes, I believe so.

nickgregz · August 3, 2020, 2:28pm

Thank you very much for the answers!
Is it better to use the API and delete old data inside of each index or separate each index into a time-based index so that it can be just dropped (daily, weekly...)?
What is the trade off in terms of the query performance when deleting the data (drop vs delete) and the trade off of having much more indices (considering the starting number of indices is in the thousands) to work with and query because they are time based?

Christian_Dahlqvist · August 3, 2020, 2:40pm

When running delete by query each document need to be individually deleted, which will result in merging and is a lot less efficient compared to deleting a complete index. Time based indices do however generally assume that you are indexing immutable data as updating can be inefficient.

Why would you need thousands of indices? Often your indices can be more corse grained than your query window and you adjust for this by filtering on the query window.

nickgregz · August 3, 2020, 2:49pm

Yes, the data is immutable.
The requirement for the indices comes because of the need to support multi tenancy with each tenant being able to support numerous filebeat/metricbeat modules sources on multiple hosts (#tenants X #hosts X #metric/filebeat modules). Taking this into account and separating each source into an individual index the total number gets quite high even without using time based indices.

Christian_Dahlqvist · August 3, 2020, 2:56pm

Consolidate indices as far as you can as having lots of small indices and shards can be very inefficient and cause performance problems.

nickgregz · August 3, 2020, 3:01pm

Thank you again very much for the answers!
One more question regarding the number of indices. Is there a certain thresholds when the number of indices starts to seriously hinder performance or it depends on the data and the cluster?

system · August 31, 2020, 3:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can Rollover API / ILM be used to keep only x days data in an index at any point of time Elasticsearch ilm-index-lifecycle-management	15	548	September 20, 2023
Monthly indices but ILM on daily basis Elasticsearch ilm-index-lifecycle-management	7	623	February 7, 2022
Create an ILM policy on basis of volumes Elasticsearch elastic-stack-monitoring , elastic-stack-alerting , docker , ilm-index-lifecycle-management	7	88	December 12, 2024
Index lifecycle management (cleanup) based on document fields Elasticsearch	2	179	September 16, 2022
ILM questions: How to delete indexes based daily referenced to @timestamp? Elasticsearch ilm-index-lifecycle-management	2	1127	May 19, 2020

Continuous index policy targeting specific data

Related topics