Query/filter ordering performance issue

nwood888 · June 7, 2019, 6:55pm

Elastic version: 6.3.2
Documents: ~120,000,000

We're noticing very odd behavior with how Elastic appears to be ordering filter/query operations. Given the following query:

POST /index/_search
{
  "from": 0,
  "size": 5,
  "query": {
"bool": {
  "must": [
    {
      "term": {
        "customer": "customer-id"
      }
    },
    {
      "query_string": {
        "fields": [
          "filename"
        ],
        "query": "*term*"
      }
    }
  ]
}
  }
}

Running this query takes ~3.5 seconds.
Removing the customer terms query/filter takes ~3.5 seconds
Removing the query_string query takes 35ms and consists of ~1,500 results (down from 120M total docs in the index)

It seems like the query_string operation is being run before the term filter since it takes 3.5 seconds with or without the term filter.

My question is, why doesn't Elastic apply the term filter first so that the query_string operation (more expensive) only runs on 1,500 documents instead of the full 120M?

Ideas I've had but haven't been able to make work yet:

Rearranging the query to get the term filter to run before the query_string. I've tried different variations of nesting must/should/bool.
Passing some type of hint to the query processor so that it knows to run the term filter first. I haven't found any way to do this.

Also, I realize wildcard prefixes cause performance issues, but I don't think that's what's happening here.

Any insight would be appreciated.

Nick

spinscale · June 10, 2019, 1:30pm

try moving the term query into the filter part of the bool query.

Also you should rethink your use of wildcards in the query string query. Maybe using shingles, ngrams or edge-ngrams at index time would yield better results at query time.

nwood888 · June 10, 2019, 4:53pm

Thank you @spinscale. Though I tried this and it still doesn't seem to address the performance issue. It does cause the end result-set to be filtered correctly, but it does not limit the initial search data set so performance is still poor.

POST /index/_search
{
  "from": 0,
  "size": 5,
  "query": {
    "bool": {
      "filter": {
          "term": {
            "customer": "customer-id"
          }
      }, 
      "must": [
        {
          "query_string": {
            "fields": [
              "filename"
            ],
            "query": "*term*"
          }
        }
      ]
    }
  }
}

We'll try the other analyzers, but even if we get that working, it seems like this filter issue would provide a massive performance improvement.

spinscale · June 11, 2019, 7:13am

I think that the query string query is so expensive, that the filter does not play a role.

Have you tried using the profile API? See https://www.elastic.co/guide/en/elasticsearch/reference/7.1/search-profile.html and https://www.elastic.co/guide/en/kibana/7.1/xpack-profiler.html

nwood888 · June 11, 2019, 5:17pm

Thank you @spinscale. I'll check out the profiler.

For the time being, we've made use of the standard analyzer, ngrams and removed the wildcard prefix and are seeing significant improvements.

system · July 9, 2019, 5:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow Wildcard searches are prioritized in Bool queries Elasticsearch	1	174	December 18, 2023
Query performance issue with the very first one Elasticsearch	6	485	March 17, 2020
Wildcard query performance Elasticsearch	8	12124	July 6, 2017
Super slow bool query with terms filter array Elasticsearch	7	1286	December 24, 2018
Slow wildcard query with fast term filters Elasticsearch	2	780	June 21, 2018

Query/filter ordering performance issue

Related topics