Query optimisation: must vs must_not order


I use ElasticSearch 2.3.
I observed that with this simple query order of must and must_not conditions in query JSON matter:

  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": { "term": { "group_id": 1 } },
          "must_not": {
            "regexp": { "category": { "value": ".*value.*", "flags": "NONE" } }

When must goes first, query is more than 10 times faster than if must_not key goes first in bool object. I think it makes sense given that must condition reduces amount of documents in question significantly and then filtering them by regexp is much faster as opposed to first filter by regexp and then by group_id.

Now when I try to run same query for another group_id (e.g. group_id: 2) this optimisation doesn't work. Regardless of must and must_not order it takes roughly the same amount of time – approximately equal to the time taken by the first unoptimised query for the first group.

Number of documents in both groups is of the same magnitude and the resulting output is similar in a way that most of records from the group are returned and only small percentage of them are filtered out because of the must_not condition.

My goal now is to compare query execution profiles for both groups and figure out what makes group_id = 1 so special that optimisation is so helpful. And eventually figure out how to make optimisation work for the second group (and all other groups).

I ran this query for both groups with profile: true but couldn't find no information that would help me figure this out.
I checked Lucene query string produced for the second group and regardless of whether regexp condition is before group_id condition or not query takes roughly the same time.

Do you have any ideas how I can figure out what makes the difference between those groups and eventually make this query faster for the second group?

Thank you in advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.