ES query performance when searching on filed with wildcard in field name

Youxu · February 20, 2017, 2:02am

I saw significant performance difference between searching on field with exact field name and searching on field with wildcard suffix i name.

Following is my mapping:

PUT /myindex
{
     "mappings": {
      "docs": {
          "Locale": {
            "type": "text",
            "fields": {
                "lowercaseraw": {
                     "type": "text",
                     "analyzer": "lowercase_keyword"
                }
            },
            "analyzer": "standard"
          },
          "Title": {
            "type": "text",
            "analyzer": "standard"
          },
          "Title_english": {
            "type": "text",
            "analyzer": "english"
          },
          "Title_french": {
            "type": "text",
            "analyzer": "french"
          },
          "Title_finnish": {
            "type": "text",
            "analyzer": "finnish"
          }
     }
}

Note: There are totally around 50 Title_ fields each of which use a specific ES language analyzer. I just list a few as example.*

When ingesting document, I will insert the doc with TWO Title, one is for "Title", the other is for "Title_xxx", "xxx" will be determined by the language of the doc.

For example, if the doc is english, then:

POST /myindex/docs/doc_1
{
  "Title": "Elasticsearch 5.0 is released!",
  "Title_english": "Elasticsearch 5.0 is released!",
  "Locale": "en-us"
}

If the doc is french, then:

POST /myindex/docs/doc_1
{
  "Title": "Elasticsearch 5.0 is released!",
  "Title_french": "Elasticsearch 5.0 is released!",
  "Locale": "fr-fr"
}

Similar to other languages.

So, although there are 50 around Title_* fields in mapping, there is always TWO Title fields has value.

Then I saw significant query performance difference when searching english documents with following two different query DSL:

{
  "query" : {
    "bool": {
        "must": {
          "multi_match" : {
            "query" : "elasticserach 5",
            "fields" : ["Title_english"],
            "type" : "phrase",
            "slop" : 5,
            "boost" : 10
          }
        },
      "filter" : {
        "bool" : {
          "must" : 
            {
                "match" : {
                  "Locale.lowercaseraw" : "en-us"
                }
            }
        }
      }
    }
  }
}

{
  "query" : {
    "bool": {
        "must": {
          "multi_match" : {
            "query" : "elasticsearch 5",
            "fields" : ["Title_*"],
            "type" : "phrase",
            "slop" : 5,
            "boost" : 10
          }
        },
      "filter" : {
        "bool" : {
          "must" : 
            {
                "match" : {
                  "Locale.lowercaseraw" : "en-us"
                }
            }
        }
      }
    }
  }
}

The only difference is, that the #1 DSL use "Title_english", the #2 use "Title_", and please be noticed that the filter makes sure only english documents match the query, that is, only Title and Title_english in the matched documents have value, all other Title_ is null.

Per my rough perf testing using JMeter (200 QPS), the perf result is as follows:
#1:
Avg. CPU usage: 33%
Avg. Response Time: 26 ms

#2
Avg. CPU usage: 67%
Avg. Response Time: 50 ms

Although #2 use "Title_*", the ES actually search on one field, "Title_english", since the filter makes sure only "Title_english" has value.

So why so big difference? Is it bad practice using wildcard suffix in search field name?

warkolm · February 24, 2017, 10:27am

Why? Because each doc needs to be checked to see if any of the fields begin in Title_, that takes time.
It's like a full table scan in a RDBMS.

Youxu · February 24, 2017, 10:32am

It makes sense, but it is interesting that if Title_xxx field value is empty, the query on ["Title", "Title_*"] perf is quite good. Whereas if Title_english is not empty, query only on ["Title_*"] got very bad performance.
But in two cases, only one field is searched, and all used wildcard.

system · March 24, 2017, 10:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES5 - Search in wildcard field name Elasticsearch	5	1122	May 24, 2017
Query on fieldname with wildcard? Elasticsearch	4	2359	December 18, 2018
Question about wildcard query Elasticsearch	9	479	May 5, 2021
Wildcard queries slow since ES 5.x Elasticsearch	2	994	December 27, 2022
Same field name and mapping but different "index" options? Elasticsearch	3	1244	June 30, 2017

ES query performance when searching on filed with wildcard in field name

Related topics