Bool filter performance and its alternatives

amreshakim · August 13, 2019, 4:29pm

Hi everyone,

I have been profiling a query that is noticeably slow in our ES setup.

Setup:
AWS, ES 7.3, 11 data nodes, 1 index, 22 shards total, total # of docs: ~350M

We store documents like this (~20 fields and most of them have a numeric type):

    {...
    "id1": long,
    "id2": long,
    status: byte # contains a number from [1..20]
    ...
    }

The following query gives us consistently high latency (~500ms-1second):

    {
      "query":{
        "bool":{
          "filter":[
            {
              "term":{
                "id1":{
                  "value":<id1>
                }
              }
            },
            {
              "term":{
                "id2":{
                  "value":<id2>
                }
              }
            },
            {
              "terms":{
                "status":[
                  1,
                  2,
                  3,
                  4,
                  5,
                  6,
                  7,
                  8,
                  9,
                  10,
                  11,
                  12,
                  13,
                  14,
                  15,
                  16,
                  17,
                  18,
                  19,
                  20
                ]
              }
            }
          ]
        }
      }
    }

# of hits: 22 documents

Profiling this query shows the majority of time is spent on the 3rd term (build_scorer to be more precise):
1 ms on 1st and 2nd terms and 500+ ms on the third one.

    {
      "type" : "PointInSetQuery",
      "description" : "status:{1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20}",
      "time" : "513.9ms",
      "time_in_nanos" : 513960388,
      "breakdown" : {
        "score" : 0,
        "build_scorer_count" : 78,
        "match_count" : 0,
        "create_weight" : 1483,
        "next_doc" : 0,
        "match" : 0,
        "create_weight_count" : 1,
        "next_doc_count" : 0,
        "score_count" : 0,
        "build_scorer" : 513924033,
        "advance" : 34767,
        "advance_count" : 26
      }
    }

This example is artificial and intentionally exaggerated since if we remove the 3rd term this query still returns 22 documents as we enumerated all possible values ([1..20]) of "status" field. However, based on what I've read the 3rd term in theory should've run against 22 documents only. I've tried different ways to express the same query, e.g., moved the 3rd term into "must", changed order and etc. The result is usually the same or even worse. If I leave just a subset of "status" values to compare against, the latency is slightly better but still at least 2 orders of magnitude higher than I'd expect.

Any help or pointers would be really appreciated!

Thanks.

system · September 10, 2019, 4:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bool term filter slow Elasticsearch	2	758	November 14, 2017
Performance difference using bool with filter Elasticsearch	11	2235	July 5, 2017
Filter vs bool Query Performance , getting bad performance on using Filters Elasticsearch	7	2829	June 6, 2018
Super slow bool query with terms filter array Elasticsearch	7	1286	December 24, 2018
Slow bool query Elasticsearch	3	588	May 4, 2018

Bool filter performance and its alternatives

Related topics