Daily index maintenance

Hi,

In my current application, I'm rolling over data with a doc limit. I'm planning to implement the logic of creating daily indices in my platform. I would like to know how would daily index affect below parameters:

  1. Processing old data. For eg: in my application, I expect old data with a latency of XX hrs. So, once the new index is created with current date, I can still expect some yesterday's data. But since my current day's index is active, how can I maintain yesterday's data in m current index. Retention will also be a problem in this scenario.

  2. Creating daily index, how the response time for search be affected? How to ensure the request goes only to few shards of the day for which the request was issued.

I'm planning to follow the below link:

Thanks
Ankita

  1. you'd need to use time based indices and then have your processing layer send the delayed data to the older index.
  2. Depends on how you are querying the indices .

Kibana used to limit the indices being queried, first by using date match based on the timestamp in the index name and later based on field stats. Improvements in Elasticsearch has meant that this is no longer required, and Kibana now sends the query to all shards matching the index pattern, so if you are on version 6.x you may not need to worry about this.

@warkolm: How can the processing layer process be made t process old data and send to older index? From what I understand, the daily indexes that are created are not on the contents of the data but the time when the data has been processed.
for eg: if data is processed on 5/10 it will do to 5/10's index even when it has 5/9's data.
Also, when I'm using alias, with a rollover is happening on the on the daily index, the wite alias points to the current and so writing data to the old index is also a challenge. How to overcome that?

If you are using Logstash then it'll automatically pick the right day's index as long as you have a date filter taking the event date from the event.

If you are using rollover, events that are processed late and indexed into the write alias will indeed end up in the current index. You could based on index statistics determine which index/indices the data should go to and index directly to the indices rather than through the alias, but there is no automatic way to do so.

If you have data coming in very late, it may be better for you to stick with indices matching fixed time periods based on the index name.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.