In my current application, I'm rolling over data with a doc limit. I'm planning to implement the logic of creating daily indices in my platform. I would like to know how would daily index affect below parameters:
Processing old data. For eg: in my application, I expect old data with a latency of XX hrs. So, once the new index is created with current date, I can still expect some yesterday's data. But since my current day's index is active, how can I maintain yesterday's data in m current index. Retention will also be a problem in this scenario.
Creating daily index, how the response time for search be affected? How to ensure the request goes only to few shards of the day for which the request was issued.
Kibana used to limit the indices being queried, first by using date match based on the timestamp in the index name and later based on field stats. Improvements in Elasticsearch has meant that this is no longer required, and Kibana now sends the query to all shards matching the index pattern, so if you are on version 6.x you may not need to worry about this.
@warkolm: How can the processing layer process be made t process old data and send to older index? From what I understand, the daily indexes that are created are not on the contents of the data but the time when the data has been processed.
for eg: if data is processed on 5/10 it will do to 5/10's index even when it has 5/9's data.
Also, when I'm using alias, with a rollover is happening on the on the daily index, the wite alias points to the current and so writing data to the old index is also a challenge. How to overcome that?
If you are using rollover, events that are processed late and indexed into the write alias will indeed end up in the current index. You could based on index statistics determine which index/indices the data should go to and index directly to the indices rather than through the alias, but there is no automatic way to do so.
If you have data coming in very late, it may be better for you to stick with indices matching fixed time periods based on the index name.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.