I'm certain the default answer to this question is "It depends", but I am trying to determine an effective approach to work with multiple indexes.
I am trying to avoid "is it better...", so I am trying to ask when does it make more sense to perform a single query across multiple indices - trying to incorporate data across disparate index fields, versus running individual queries tailored to each index across multiple indexes and then combining the results by the resulting scores?
I'll give you an example when we had to move from "one search across multiple indices" to "multiple searches across individual index". We have some case that is using aggregations using multiple dimensions (>10) on fields that have high cardinality. Indices are stored per day and one of the dimensions was "date". So when we executed that aggregation on let's say one month we had huge number of aggregations and response was huge. Most of the time node that was doing coordination job for that particular query would die with OOM.
Other than that I cannot think of any other reason why not to query multiple indices in single HTTP request
One other thing to consider is, your relevancy/matching criteria is per query. So if you do a single query across indexes, you would have a consolidated list of results as per your relevancy criteria.
On the other hand, multiple queries will give you multiple results and they may not have any relationship with each other.
So if you are looking to dump out a single set of search results, a single query makes sense.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.