Filter Aggregation vs Msearch

jdgenio · January 23, 2018, 4:08am

Hi, we currently use elasticsearch for our backend and have noticed that msearch performs significantly better than a single query with multiple filter aggregations and I'm curious as to why this is so.

In a query that returns ~130 million results, using a single request that gets the sum aggregation of two numerical fields for multiple filter aggregations takes around 10-11 seconds while splitting this query into multiple subqueries and passing them into a multisearch query takes around 2-3s.

The trend seems to be that filter aggregations is outperformed by multisearch with high result queries but beats msearch with queries that has approximately 2 million or fewer results. Is the reason for this because msearch runs in parallel? If so, then why does aggregations beat msearch for lower result queries. For reference, at around 1.3m results, the filter aggregations beats msearch with an average of 0.33s vs 0.5s, respectively. All other requests with filter aggregations that return below 1.3m results beats its msearch equivalent. (Note: the only filters present within the search is a date range filter, and the subfilters are chunks of the main date range).

Does anyone here know the reason behind this? Thanks in advance.

jpountz · January 23, 2018, 6:16pm

I suspect this is due to the fact that requests visit all matches of the query. This means that running eg. this will need to visit all documents, and for each one check whether it matches foo:bar.

{
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "my_filter": {
      "filter": {
        "term": { "foo": "bar" }
      },
      "aggs": // your aggs
    }
  }
}

While the below query can directly iterate over only documents that match foo:bar, ignoring the rest:

{
  "query": {
    "term": {
      "foo": "bar"
    }
  },
  "aggs": // your aggs
}

The latter is faster, especially if the filter only matches a minority of the documents.

jdgenio · January 24, 2018, 12:02am

Hi Adrien, sorry for not including my request body, but I'm sure what ou said is not the case. Here is the request body:

{  
    "query":{  
        "bool":{  
            "filter":{  
                "range":{  
                    "Arrival Date":{  
                        "gte":1136044800,
                        "lte":1505779200
                    }
                }
            }
        }
    },
    "size":0,
    "aggregations":{  
        "Year 1":{  
            "filter":{  
                "range":{  
                    "Arrival Date":{  
                        "gte":1136044800,
                        "lte":1167580800
                    }
                }
            },
            "aggregations": {
                "stat 1": {
                    "sum": {
                        "field": "field 1"
                    }
                },
                "stat 2": {
                    "sum": {
                        "field": "field 2"
                    }
                }
            }
        }
        ... (year 2 to year x until 1505779200 is reached) ...
    }
}

As you can see, the main query is already filtered, so the search will not be revisited in each filter aggregation.

jdgenio · January 24, 2018, 12:05am

Here is its msearch equivalent:

[
    { "index": "myindex", "type": "mytype" }
    {
        "query": {
            "bool": {
                "filter":{  
                    "range":{  
                        "Arrival Date":{  
                            "gte":1136044800,
                            "lte":1167580800
                        }
                    }
                }
            }
        }
    },
    ... (loop from year 2 to year x)
]

system · February 21, 2018, 12:05am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance Question Elasticsearch	2	574	August 2, 2017
Multiple aggregation in one request vs one aggregation per request performance Elasticsearch	3	5302	May 8, 2017
Spreading expensive aggregations over a multi search, is it faster? Elasticsearch	4	1846	July 5, 2017
Does aggregations runs in parallel? Elasticsearch	4	1512	July 27, 2020
_msearch vs _search Elasticsearch	5	11245	June 22, 2017

Filter Aggregation vs Msearch

Related topics