Elasticsearch boolean term query latency increases with zero match terms

I'm observing some interesting behavior with boolean term queries on Elasticsearch that I'd like to understand further.
Each document in the index has several terms under the section ev_tags.
I'm issuing queries like below and I am observing an interesting trend that is quite surprising and I would like to understand it.

  1. All the terms in the query match atleast one document in the index. When this is the case, the query latency increases as the number of terms increases (while keeping the number of matched documents same). This is expected behavior as there are more posting lists to go through.
  2. When all the terms in the query don't match any documents, the query finishes really fast with 0 documents - as the terms are not found in the inverted index.
  3. This is the case that surprised me. When I added terms that matched zero documents to the terms that matched documents, the query latency INCREASED. For instance, I issued a term query with 100 terms (all having matching documents). I then added 1000 non-matching codes to the query and this increased the latency significantly. Given the query performance for case 2, I assumed that The 1000 codes would be immediately thrown away as they are not in the inverted index and the query perf should be same as if I had 100 terms in the query.

Any ideas on what can explain this behavior? Any help is much appreciated.
I'm using Elasticsearch 7.16.3

Query

{
  "_source": false,
  "stored_fields": [
    "field1",
    "field2",
    "field3"
  ],
  "from": 0,
  "size": 10,
  "sort": [
    "_doc"
  ],
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "slice": "0"
              }
            },
            {
              "term": {
                "ev_tags": "<ev_term_1>"
              }
            },
            {
              "term": {
                "ev_tags": "<ev_term_2>"
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "term": {
                      "ev_tags": "<ev_term_3>"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

Hi

Elasticsearch doesn’t know they’re not in the index until it tries to find them. Those 1000 lookups could account for some time.
Note also when benchmarking it’s important to take into account the effect of various caches and ensure you’re measuring uncached queries.

Appreciate the response @Mark_Harwood1

If it's the 1000 lookups that is taking time, shouldn't case 2 (none of the query terms matching any index terms) also take a few seconds. In case 2 I'm observing that no matter how many terms I have in my query, the response is almost instantaneous. It's only when I'm adding atleast one term that matches atleast a few documents that I'm starting to see this increase in latency.

Can you provide more information/resources about caching and how to disable it? This is a good point that I hadn't taken into consideration.

Check out this blog

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.