Hello,
I am pretty new to Elasticsearch, but got some questions about retention policies.
If I am implementing a retention policy for example to delete old data from a index if it is older then 5 days. How is this handled by Elasticsearch? If the task kicks in to run the retention policy, so deleting old data. Does this impact the performance? Could hardware usage on CPU, or RAM ramp up? I would guess that the retention policy is handled as a background task. So given a low priority and also not really intensive for hardware usage but I am not sure.
How is it handled if at the same time new data comes in and old data is deleted? I would guess this is buffered in some kind of way.
Elasticsearch ILM assumes the use of time-based indices and works by deleting complete indices once the data within them is older than the retention period. It does not support deleting select data from within an index.
So if I understand correctly if the retention policy starts deleting data, it is more of a group of data (indices) that is older then the retention period?
But how is the delete task run. Does this delete task run and if necessary use all CPU and RAM of the system to handle the task? Or is it handled in the background and if something more important comes in, the delete task is parked for a moment?
Deleting complete indices like ILM does is very light and does not consume a lot of resources at all. It does not delete data from an index but rather the full index.
ILM drops the full index so it is very quick. It is no longer included in the cluster state and related files on disk are removed. Size does not really matter.
If you were deleting data from within the index, e.g. by delete by query, the size would matter and a lot more resources be used.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.