Filter Aggregation vs Msearch


(Rafhael Genio) #1

Hi, we currently use elasticsearch for our backend and have noticed that msearch performs significantly better than a single query with multiple filter aggregations and I'm curious as to why this is so.

In a query that returns ~130 million results, using a single request that gets the sum aggregation of two numerical fields for multiple filter aggregations takes around 10-11 seconds while splitting this query into multiple subqueries and passing them into a multisearch query takes around 2-3s.

The trend seems to be that filter aggregations is outperformed by multisearch with high result queries but beats msearch with queries that has approximately 2 million or fewer results. Is the reason for this because msearch runs in parallel? If so, then why does aggregations beat msearch for lower result queries. For reference, at around 1.3m results, the filter aggregations beats msearch with an average of 0.33s vs 0.5s, respectively. All other requests with filter aggregations that return below 1.3m results beats its msearch equivalent. (Note: the only filters present within the search is a date range filter, and the subfilters are chunks of the main date range).

Does anyone here know the reason behind this? Thanks in advance.


(Adrien Grand) #2

I suspect this is due to the fact that requests visit all matches of the query. This means that running eg. this will need to visit all documents, and for each one check whether it matches foo:bar.

{
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "my_filter": {
      "filter": {
        "term": { "foo": "bar" }
      },
      "aggs": // your aggs
    }
  }
}

While the below query can directly iterate over only documents that match foo:bar, ignoring the rest:

{
  "query": {
    "term": {
      "foo": "bar"
    }
  },
  "aggs": // your aggs
}

The latter is faster, especially if the filter only matches a minority of the documents.


(Rafhael Genio) #3

Hi Adrien, sorry for not including my request body, but I'm sure what ou said is not the case. Here is the request body:

{  
    "query":{  
        "bool":{  
            "filter":{  
                "range":{  
                    "Arrival Date":{  
                        "gte":1136044800,
                        "lte":1505779200
                    }
                }
            }
        }
    },
    "size":0,
    "aggregations":{  
        "Year 1":{  
            "filter":{  
                "range":{  
                    "Arrival Date":{  
                        "gte":1136044800,
                        "lte":1167580800
                    }
                }
            },
            "aggregations": {
                "stat 1": {
                    "sum": {
                        "field": "field 1"
                    }
                },
                "stat 2": {
                    "sum": {
                        "field": "field 2"
                    }
                }
            }
        }
        ... (year 2 to year x until 1505779200 is reached) ...
    }
}

As you can see, the main query is already filtered, so the search will not be revisited in each filter aggregation.


(Rafhael Genio) #4

Here is its msearch equivalent:

[
    { "index": "myindex", "type": "mytype" }
    {
        "query": {
            "bool": {
                "filter":{  
                    "range":{  
                        "Arrival Date":{  
                            "gte":1136044800,
                            "lte":1167580800
                        }
                    }
                }
            }
        }
    },
    ... (loop from year 2 to year x)
]

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.