I need to rollover the index on daily basis. currently I am using curator rollover action (max_size: 1d) and my index pattern : test-YYYY.MM.DD-1. my cronjob is running every hour once.
Our scenario: Index created at 4:30 and data continuously ingesting, I need rollover happen at next day 00:00 but rollover happens after 4:30 hrs next day. Due to this next day (2019-07-25) data ingested in previous day index(test-2019-07-24).
How could I ingest the data based on the timestamp 25th day data to 25th day index?
Consider document consists of timestamp filed in it.
Our use case is to delete the data older than n days, For that I am planning to do day based index(daily rollover using curator rollover job). Actually I want 25th day data in 25 day index only and so on.
Also I would like know how curator calculates the age of the index?
Please explain.
Why, when Elasticsearch can query multiple indices at once, and still limit data to a specific date range? This is needlessly complex. It's not a big deal to still do rollover indices—even daily ones at UTC 0:00—and just delete them on day 26, so day 25 is still present.
It's terribly complex to try to get every single event into Elasticsearch on precisely the correct date index if you're using anything other that UTC timing. That's why I ask.
If we do rollover daily once, we can have multiple indices(test-day1,test-day2,test3....). If we need to query data for last one day only, then there is no point to query all the indices (test*). If we have one day data in a specific index(day1 data ==> test-day1). Then it will easy to query that one index alone(test-day1) instead of querying all the index for last one data.
Also if one index may contain some second day means, while deleting indices we will loose the data.
In order to avoid the above scenarios I need one day data present in one index.
I still wouldn't recommend that approach, due to the complexity involved. From where I sit, you're adding a lot of complexity on the ingest side to save yourself a tiny bit of work on the query side.
Querying all indices isn't necessarily a bother with updates to Elasticsearch since 6.0. Also, the easiest thing is to add a date range filter to your query that limits it to the desired time frame. Filters are exceptionally fast, and a document either is, or is not within a time range.
Don't delete an index until every part of the index is acceptable to delete. Curator provides an extension to the age filter that can calculate the age of the oldest or youngest/newest document in an index. This is called field_stats, and was originally based on the Elasticsearch 5.x field stats query. It now simply performs an aggregation for these values. Using this allows you to ensure an index will not be deleted until its youngest document is safely within your desired threshold.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.