Numeric term(and terms) query much slower after 5.6.15 -> 7.1.1 upgrade

hyf_bd · March 10, 2020, 9:32am

the same settings, mapping , query, and the data. all of things is the same, except es version.

the follow query cost 400+ms in es5, but cost 600+ms in es7.
we compare the profile, had found that:

term query ( terms also) of numeric in es 7 is much slow than es5.
match query in es 7 is better than es5.

how we should improve in es7 ??

some body suggest change numeric type to keyword, but it's not take effect.

Looking forward to your reply， thanks

settings (partially):

      "analysis" : {
          "filter" : {
            "my_shingle_filter" : {
              "max_shingle_size" : "2",
              "min_shingle_size" : "2",
              "output_unigrams" : "false",
              "type" : "shingle"
            }
          },
          "analyzer" : {
            "my_shingle_analyzer" : {
              "filter" : [
                "my_shingle_filter",
                "lowercase"
              ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            }
          }
        },

mapping:

   "mappings" : {
      "properties" : {
        "ai_tags" : {
          "type" : "long"
        },
        "album_meta" : {
          "properties" : {
            "episode_one" : {
              "type" : "long"
            },
            "tags" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "title" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "type" : {
              "type" : "long"
            }
          }
        },
        "chosen" : {
          "type" : "boolean"
        },
        "comment_count" : {
          "type" : "long"
        },
        "create_time" : {
          "type" : "long"
        },
        "create_type" : {
          "type" : "long"
        },
        "digg_count" : {
          "type" : "long"
        },
        "favorite_count" : {
          "type" : "long"
        },
        "go_detail_count" : {
          "type" : "long"
        },
        "hashtags" : {
          "type" : "keyword"
          }
        },
        "heat" : {
          "type" : "integer"
        },
        "human_tags" : {
          "type" : "long"
        },
        "id" : {
          "type" : "long"
        },
        "impr_count" : {
          "type" : "long"
        },
        "language" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "level" : {
          "type" : "long"
        },
        "link_level" : {
          "type" : "integer"
        },
        "link_title_terms" : {
          "type" : "text",
          "fields" : {
            "shingles" : {
              "type" : "text",
              "analyzer" : "my_shingle_analyzer"
            }
          },
          "analyzer" : "whitespace"
        },
        "media_type" : {
          "type" : "integer"
        },
        "post_source" : {
          "type" : "long"
        },
        "region" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "share_count" : {
          "type" : "long"
        },
        "sim_id" : {
          "type" : "long"
        },
        "source" : {
          "type" : "integer"
        },
        "status" : {
          "type" : "integer"
        },
        "text_terms" : {
          "type" : "text",
          "fields" : {
            "shingles" : {
              "type" : "text",
              "analyzer" : "my_shingle_analyzer"
            }
          },
          "analyzer" : "whitespace"
        },
        "update_time" : {
          "type" : "long"
        },
        "user_id" : {
          "type" : "long"
        },
        "user_status" : {
          "type" : "integer"
        },
        "username_terms" : {
          "type" : "text",
          "fields" : {
            "shingles" : {
              "type" : "text",
              "analyzer" : "my_shingle_analyzer"
            }
          },
          "analyzer" : "whitespace"
        }
      }
    }
  }

query:
time cost of each step: left is es7, right is es5;
total cost es7 600+ms, es5 400+ms

 "profile": true,
  "from": 0,
  "query": {
    "function_score": {
      "boost_mode": "multiply",
      "functions": [
        {
          "field_value_factor": {
            "field": "favorite_count",
            "missing": 1,
            "modifier": "ln2p"
          }
        },
        {
          "field_value_factor": {
            "field": "digg_count",
            "missing": 1,
            "modifier": "log2p"
          }
        }
      ],
      "query": {
        "bool": {
          "filter": {
            "terms": {
              "media_type": [            //time cost: 143ms     ->89ms
                3,
                5
              ]
            }
          },
          "minimum_should_match": "0",
          "must": [
            {
              "term": {
                "status": 3                           //time cost: 82ms     -->45ms
              }
            },
            {
              "term": {
                "user_status": 100                   //time cost: 72ms       -->35ms
              }
            },
            {
              "bool": {
                "minimum_should_match": "1",
                "should": [                         //time cost: 148ms     -->212ms
                  {
                    "match": {
                      "text_terms": {               
                        "boost": 100000,
                        "query": "微信 动态 搞笑 图片 500张"
                      }
                    }
                  },
                  {
                    "match": {
                      "username_terms": {            
                        "query": "微 信 动 态 搞 笑 图 片 5 0 0 张 "
                      }
                    }
                  },
                  {
                    "match": {
                      "hashtags": {
                        "query": "微信动态搞笑图片500张"  //time cost:  very little， can be ignore
                      }
                    }
                  },
                  {
                    "match": {
                      "hashtags.keyword": {
                        "query": "微信动态搞笑图片500张"
                      }
                    }
                  }
                ]
              }
            }
          ],
          "must_not": [
            {
              "range": {
                "link_level": {               //161 ms      -> 33ms
                  "from": null,
                  "include_lower": true,
                  "include_upper": false,
                  "to": 5
                }
              }
            },
            {
              "terms": {
                "create_type": [            //0.2 ms  --> 0.6ms
                  100,
                  102,
                  103,
                  104
                ]
              }
            },
            {
              "term": {
                "media_type": 5           //time cost:  very little， can be ignore
              }
            }
          ]
        }
      },
      "score_mode": "multiply"
    }
  },
  "size": 3000
}

profile:
es5_profile
es 5 profile(cost 400+ms)
es7_profile

es 7 profile (cost 600+ms)

Mark_Harwood · March 10, 2020, 11:10am

It didn't improve performance or you weren't able to change the mapping?
Media-type looks like it should be a keyword field (numbers like integer are optimised for range queries not exact-match).

hyf_bd · March 10, 2020, 12:35pm

we had tried it, but it's more slower than integer. especially this sub query

        "filter": {
            "terms": {
              "media_type": [            //time cost: 143ms     ->89ms
                3,
                5
              ]
            }
          },

this sub query will cost 300+ms, if it's integer type, it cost 140+ms only. if in es5, it 's cost 89ms.

this sub query change as follow will be better, if it was keyword type, it cost 160+ms

 "filter": {
        "bool": {
        "should":[
               { "term":{  "media_type": 3  } }, 
              { "term":{  "media_type": 5  } }
          ]
      }
}

but if status and user_status change to keyword, it will quick a little, "status: 3" cost 70+ms, "user_status: 100" cost 60+ms, but it's still slower than es5.

        "must": [
            {
              "term": {
                "status": 3                //time cost: 82ms     -->45ms
              }
            },
            {
              "term": {
                "user_status": 100         //time cost: 72ms       -->35ms
              }
            },

Mark_Harwood · March 10, 2020, 3:58pm

We're always looking for regressions in our nightly benchmarks and this is not one we've observed.

What's the cardinality of the fields you tested? I can't imagine there's that many different media types.

zxx_bd · March 11, 2020, 6:50am

For the cardinality,
user_status: 4
status: 7
media_type: 6
and there are about 69, 000, 000 documents in the index

Mark_Harwood · March 11, 2020, 9:44am

Thanks for that. Just ran my own terms search benchmarks here with longs vs keywords and 7.6.1 vs 5.6.0.

In all cases: 7.6.1 is better than 5.6.0 and keywords are better than longs.

hyf_bd · March 11, 2020, 1:32pm

"my_id": qTerms # V5.6.0 =3.3s V7.6.1 =1.2s

3ms in es5, 1.2ms in es7 per request ??

we try again, delete other sub query, only remain terms query. but it's cost between 100ms ~200ms in es5.

there are some difference with your benchmarks:

the doc count of each media_type is 11,000,000. your's 200,000
the type of media_type is integer, your's long

besides, can your compare 5.6 with 7.1? may be 7.1 and 7.6 have some difference.

thanks.

Mark_Harwood · March 11, 2020, 2:30pm

No, 3.3s as in 3.3 seconds to complete the 1,000 searches on random choices of value.
Timings taken after repeated runs (once file system cache has kicked in and response times for a test config stabilise).

I tried with integers and 7.1 is roughly similar to 7.6.1.
When it comes to 5.6 vs 7.x there's a big speed up in relation to track_total_hits defaulting to False (newer versions of elasticsearch accelerate matching if it knows you have no aggregations and only need approximate numbers of matches). I updated my test to turn this optimisation off for 7.x tests. Even with this flag set to "true" (the slower mode) 7.x is faster than 5.6 for this test.

system · April 8, 2020, 2:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query slower with ES 5 compared with ES 7 Elasticsearch painless	2	208	February 5, 2024
Found in Full-text search scenario，ES 7 slow than ES 5 Elasticsearch	5	732	August 13, 2021
Performance problems when Upgrading from ElasticSearch 1.7.4 to 5.4.0 Elasticsearch	11	2516	August 22, 2017
Slower query_string query in elasticsearch 7.5 as compared to elasticsearch 2.4 Elasticsearch	9	787	October 11, 2021
Seeing slower bulk indexing performance after upgrade from ES5.2 to ES7.1 Elasticsearch	7	564	October 8, 2019

Numeric term(and terms) query much slower after 5.6.15 -> 7.1.1 upgrade

Related topics