I have read a few threads regarding this question , Most of them suggest using DeleteBy Query as Rollover API seems to delete/move to other phase the indices and create the new ones based on given condition.
I could not find a way in which Rollover / ILM can work similar to the deleteBy Query which does as below
ILM works by deleting complete indices. It can not be used to delete data from within an index. If you need to delete some data from within an index, delete-by-query is what you need. You will need to trigger and run these yourself though. There is no way to do so from within Elasticsearch.
He Mentioned " I highly recommend looking into the Rollover API for a way to simplify this for you. Then you can make your "non-time-series" index into a time-series index for all intents and purposes. "
I didn't quite understand how Rollover can help in this usecase
Yes but again what Rollover does is that it deletes/moves the existing index to next phase based on a condition right and creates a new index as write index. In that way, a index would not have lets say 30 days past days of data at any time
For example lets say this is the policy below and the flow of indices in next image
With rollover you have a set of indices that hold the data covering the retention period and you query all of them at the same time through an index pattern or alias. Once the oldest index only contains data that is beyond the retention period, it will be deleted. This means that you at any point in time may have a bit more data available that your retention period specifies, but that is generally not a problem.
I do not see what the issue is. This is how most people manage retention in Elasticsearch.
So lets consider the policy and the flow of indices through the ILM ( seen in 2 images i sent in my previous post ) .
Lets assume I want the logs index to always keep last 2 days of data ( yesterday and today ) every time. Just like on Amazon at any time you can see last 3 months of orders that you did.
The hot phase has max age 2 days for rolling over to warm phase.
So on day 1 and day 2 , If i need to view the last 2 days data I have only 1 index/data stream under the alias name "logs_index" for example .
On day 3 , when rollover happens and new index is created the older one goes to warm phase.
Now if i need to view last 2 days data ( I would have to query new index and some part of old index )
Similarly on day 7 I would have to query index no. 4 and index no. 3 and so on based on this policy in the image in previous comment.
So how would Elasticsearch know which physical indices to query for to get exactly last 2 days data based on timestamps .
If you are viewing your data through Kibana and set the time picker to last 2 days, Kibana will query all the indices backing the data stream/index pattern with a time filter added to the query. This means data will in practice only be returned from the last 2 indices. Querying a set of indices that does not hold any relevant data based on the time range is very fast so it is not a performance issue.
Do you mean I cannot query using the alias name because under the alias only 1 active/write index would be there in hot phase and only that would be queried ... So in other words alias can only query the index in hot phase ?
Hence to query other indices as well I will need to use the regex name
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.