I have 2 clusters.
Cluster A: have 1 big index storing 500 GB of data
Cluster B: have monthly indices of 30GB for 15 months
There is a search query which search over a single index in cluster A and over read-alias [image-2019-10, image-2019-11, ..., image-2021-06] in Cluster B.
According to search latency metrics of Cluster Health,
Cluster B has 4-6 ms of search latency over a shard.
Cluster A as 30-40 ms of search latency over a shard.
But the overall result of the query in Cluster A takes 10seconds and Cluster B is taking 20seconds.
I think Cluster B uses threads internally to search parallely over multiple indices, so overall time taken by search query with alias should be less. Am I missing something?
Yes. Imagine looking up a word by hand in a 5000-page dictionary, and compare that to trying to find the same word in ten 500-page dictionaries each of which contains a random tenth of the words. On average you'd find it quicker in the one big dictionary, even though your per-dictionary search time would be slower (e.g. 5000-page books are heavy and hard to use).
If there was a few of you working together you might be able to achieve better performance, but trying to coordinate multiple lookups in parallel across multiple people takes significant time and effort too. Parallelising the work only takes you so far, doing less work (using a more efficient data structure) is the way to go.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.