Hi, we currently use elasticsearch for our backend and have noticed that msearch performs significantly better than a single query with multiple filter aggregations and I'm curious as to why this is so.
In a query that returns ~130 million results, using a single request that gets the sum aggregation of two numerical fields for multiple filter aggregations takes around 10-11 seconds while splitting this query into multiple subqueries and passing them into a multisearch query takes around 2-3s.
The trend seems to be that filter aggregations is outperformed by multisearch with high result queries but beats msearch with queries that has approximately 2 million or fewer results. Is the reason for this because msearch runs in parallel? If so, then why does aggregations beat msearch for lower result queries. For reference, at around 1.3m results, the filter aggregations beats msearch with an average of 0.33s vs 0.5s, respectively. All other requests with filter aggregations that return below 1.3m results beats its msearch equivalent. (Note: the only filters present within the search is a date range filter, and the subfilters are chunks of the main date range).
Does anyone here know the reason behind this? Thanks in advance.
I suspect this is due to the fact that requests visit all matches of the query. This means that running eg. this will need to visit all documents, and for each one check whether it matches foo:bar.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.