Query slowdown from 2.4 -> 6.1


(Tejas Mandke) #1

The following query slows down 5x when we switched from ES 2.4 to 6.1

{
  "profile": true,
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "abcd",
            "fields": [
              "name",
              "note",
              "url",
              "email",
              "email.filter_ngram",
              "title",
              "top_document_external_url^30",
              "top_document_external_url_parts"
            ],
            "type": "most_fields"
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "should": [
              {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "_type": "document_group"
                      }
                    },
                    {
                      "term": {
                        "hidden": false
                      }
                    },
                    {
                      "terms": {
                        "folder_accessors": [
                          "Team#23",
                          "Team#8368",
                          "User#66731",
                          "User#66731"
                        ]
                      }
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "_type": "link"
                      }
                    },
                    {
                      "term": {
                        "hidden": false
                      }
                    },
                    {
                      "terms": {
                        "user_id": [
                          66731
                        ]
                      }
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must": [
                    {
                      "terms": {
                        "_type": [
                          "contact",
                          "account"
                        ]
                      }
                    },
                    {
                      "bool": {
                        "should": [
                          {
                            "terms": {
                              "user_ids": [
                                66731
                              ]
                            }
                          },
                          {
                            "term": {
                              "company_id": 21
                            }
                          }
                        ]
                      }
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "_type": "bundle"
                      }
                    },
                    {
                      "term": {
                        "user_company_id": 21
                      }
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must": [
                    {
                      "terms": {
                        "_type": [
                          "campaign_link"
                        ]
                      }
                    },
                    {
                      "term": {
                        "hidden": false
                      }
                    },
                    {
                      "term": {
                        "user_id": 66731
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  },
  "size": 15
}

Differences/Similarities

  • ES 2.4 - 1 shard 1 rep, ES6.1 2 shards 1 rep
  • Mapping are identical
    • string fields in the filter section are all keyword
    • string fields in match are all standard analyzer with edge_ngram filter min 3 max 20
    • numbers fields are Long type

Profile breakdown

# ES 2.4
"breakdown": {
                  "score": 0,
                  "create_weight": 35429889,
                  "next_doc": 3178747,
                  "match": 0,
                  "build_scorer": 862907,
                  "advance": 0
                }
# ES 6.1 shard 0
"breakdown": {
                  "score": 0,
                  "build_scorer_count": 17,
                  "match_count": 0,
                  "create_weight": 102391897,
                  "next_doc": 0,
                  "match": 0,
                  "create_weight_count": 1,
                  "next_doc_count": 0,
                  "score_count": 0,
                  "build_scorer": 99745,
                  "advance": 0,
                  "advance_count": 0
                }
# ES 6.1 shard 1
"breakdown": {
                  "score": 0,
                  "build_scorer_count": 15,
                  "match_count": 0,
                  "create_weight": 216820737,
                  "next_doc": 0,
                  "match": 0,
                  "create_weight_count": 1,
                  "next_doc_count": 0,
                  "score_count": 0,
                  "build_scorer": 119353,
                  "advance": 0,
                  "advance_count": 0
                }

Based on this I see that create_weight is the main cause of the slowdown. Any thoughts on how this can be improved?


(David Pilato) #2

Thanks for the details. Pinging @jpountz who might have ideas.


(Adrien Grand) #3

Those versions have lots of differences, but I think it is likely due to the fact that you identifiers are mapped as numbers, while they would perform better as keyword: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html#_map_identifiers_as_literal_keyword_literal

In 5.0, numeric fields were refactored in Elasticsearch in order to be more space-efficient and faster at range queries. However this came with a downside: term and terms queries are now slower on numeric fields. I don't think there are reasons why you would need to run range queries on those id fields, so you could map them as a keyword instead.

Note that you do not need to make it a string in the json document, Elasticsearch will happily accept numbers for a keyword field, it will just store them as their string representation internally, ie. "5" for 5 for instance.


(Tejas Mandke) #4

Thanks @jpountz, but looks like the time is spent more in TermQuery create_weight

The name field uses

            "analyzer": "filter_based_edge_ngram_analyzer",
            "search_analyzer": "standard"

and filter_based_edge_ngram_analyzer is

:analysis => {
          :filter => {
            :edge_ngram_filter => {
              :type => 'edge_ngram',
              :min_gram => MIN_NGRAM_LENGTH.to_s,
              :max_gram => MAX_NGRAM_LENGTH.to_s
            }
          },
          :analyzer => {
            :filter_based_edge_ngram_analyzer => {
              :type => 'custom',
              :tokenizer => 'standard',
              :filter => %w(lowercase edge_ngram_filter)
            }
          }
        }

(Adrien Grand) #5

This shouldn't have changed much between 2.4 and 6.1. Is the slowdown consistently reproducible, or does it only occur on a cold index?

How long does the query take to run with profiling disabled?


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.