Need to create Index per 30 days interval, total of 60+ index for 5 years of data with ILM.
So while searching within the date range I could search only the index created within that date range which would increase the response time instead of searching all the 60+ index.
As I understand from this GitHub post and this Elastic Discuss post about having a new daily index (at midnight for my case), the solution to it is not Index Lifecycle Management.
Q: How can I achieve this, any alternate solution for the same?
Before ILM and rollover existed it was considered best practice to have time-based indices covering fixed time periods. One would include the time period identifier in the index name and thus be able to determine exactly which indices that needed to be queried based on the time window the query addressed.
As Elasticsearch evolved it became much more efficient at quickly and efficiently determining if any data matching the time interval exists in the index or not. As a result of this it became best practice to simply query all matching indices and let Elasticsearch exclude the indices that have no matching data behind the scenes. This allowed rollover to be introduced as it was no longer required to have indices cover fixed and deterministic time periods.
Rollover has the benefit of allowing new underlying indices to be created based on index age or size. If you have data volumes vary over time you will be able to achieve a more unifirm shard size, which is generally recommended for optimal performance. This means indices will not be cut off at specific time periods, but that is generally not required.
I would recommend adopting rollover if your data is immutable. and not try to control exactly when rollover occurs. I would also recommend you test the difference between having a query target all vacking indices or just a few to see what the difference is. Do not just assume there will be a significant difference.
Hi , I am bit curious here. If I have 100 indices in my Elastic Cluster and the data which I am searching reside in 1 of the index, If I specify the index name in the search query , will it not be faster as compared to search wherein I don't specify the Index name ?
I do understand that we should test the performance. But my assumption was that there should be considerable amount to difference in the search query, specially if the Index size are huge ( Ex : Index size : 1 TB)
Yes, it may be faster. The question is whether the difference in latency is large enough to warrant all the additional work and management that would be required to achieve and manage it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.