We currently use a date pattern in index name to contain a monthly breakdown of documents. In this way, an app can query the right index (or indexes) based on an initial/end date, avoiding hitting all indexes / shards.
Is it possible to do something like this with DataStream? Is there some strategy to avoid hitting all shards based on a date criteria? (like routing by @timestamp)
Kibana used to do this, but that was later removed as hitting all indices was made a lot more efficient. Have you tested how much difference it makes in your use case, e.g. clear caches and hit all indices and then compare that to clearing caches and only hitting the required indices?
In the past, we've noticed high load and cpu usage in the cluster when multiple search requests are made by clients that hit many indexes. We solve this scenario by building the search url dynamically, using the date parameters, so the requests just hit the right indexes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.