Numeric term(and terms) query much slower after 5.6.15 -> 7.1.1 upgrade

the same settings, mapping , query, and the data. all of things is the same, except es version.

the follow query cost 400+ms in es5, but cost 600+ms in es7.
we compare the profile, had found that:

  1. term query ( terms also) of numeric in es 7 is much slow than es5.
  2. match query in es 7 is better than es5.

how we should improve in es7 ??

some body suggest change numeric type to keyword, but it's not take effect.

Looking forward to your reply, thanks

settings (partially):

      "analysis" : {
          "filter" : {
            "my_shingle_filter" : {
              "max_shingle_size" : "2",
              "min_shingle_size" : "2",
              "output_unigrams" : "false",
              "type" : "shingle"
            }
          },
          "analyzer" : {
            "my_shingle_analyzer" : {
              "filter" : [
                "my_shingle_filter",
                "lowercase"
              ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            }
          }
        },

mapping:

   "mappings" : {
      "properties" : {
        "ai_tags" : {
          "type" : "long"
        },
        "album_meta" : {
          "properties" : {
            "episode_one" : {
              "type" : "long"
            },
            "tags" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "title" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "type" : {
              "type" : "long"
            }
          }
        },
        "chosen" : {
          "type" : "boolean"
        },
        "comment_count" : {
          "type" : "long"
        },
        "create_time" : {
          "type" : "long"
        },
        "create_type" : {
          "type" : "long"
        },
        "digg_count" : {
          "type" : "long"
        },
        "favorite_count" : {
          "type" : "long"
        },
        "go_detail_count" : {
          "type" : "long"
        },
        "hashtags" : {
          "type" : "keyword"
          }
        },
        "heat" : {
          "type" : "integer"
        },
        "human_tags" : {
          "type" : "long"
        },
        "id" : {
          "type" : "long"
        },
        "impr_count" : {
          "type" : "long"
        },
        "language" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "level" : {
          "type" : "long"
        },
        "link_level" : {
          "type" : "integer"
        },
        "link_title_terms" : {
          "type" : "text",
          "fields" : {
            "shingles" : {
              "type" : "text",
              "analyzer" : "my_shingle_analyzer"
            }
          },
          "analyzer" : "whitespace"
        },
        "media_type" : {
          "type" : "integer"
        },
        "post_source" : {
          "type" : "long"
        },
        "region" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "share_count" : {
          "type" : "long"
        },
        "sim_id" : {
          "type" : "long"
        },
        "source" : {
          "type" : "integer"
        },
        "status" : {
          "type" : "integer"
        },
        "text_terms" : {
          "type" : "text",
          "fields" : {
            "shingles" : {
              "type" : "text",
              "analyzer" : "my_shingle_analyzer"
            }
          },
          "analyzer" : "whitespace"
        },
        "update_time" : {
          "type" : "long"
        },
        "user_id" : {
          "type" : "long"
        },
        "user_status" : {
          "type" : "integer"
        },
        "username_terms" : {
          "type" : "text",
          "fields" : {
            "shingles" : {
              "type" : "text",
              "analyzer" : "my_shingle_analyzer"
            }
          },
          "analyzer" : "whitespace"
        }
      }
    }
  }

query:
time cost of each step: left is es7, right is es5;
total cost es7 600+ms, es5 400+ms

 "profile": true,
  "from": 0,
  "query": {
    "function_score": {
      "boost_mode": "multiply",
      "functions": [
        {
          "field_value_factor": {
            "field": "favorite_count",
            "missing": 1,
            "modifier": "ln2p"
          }
        },
        {
          "field_value_factor": {
            "field": "digg_count",
            "missing": 1,
            "modifier": "log2p"
          }
        }
      ],
      "query": {
        "bool": {
          "filter": {
            "terms": {
              "media_type": [            //time cost: 143ms     ->89ms
                3,
                5
              ]
            }
          },
          "minimum_should_match": "0",
          "must": [
            {
              "term": {
                "status": 3                           //time cost: 82ms     -->45ms
              }
            },
            {
              "term": {
                "user_status": 100                   //time cost: 72ms       -->35ms
              }
            },
            {
              "bool": {
                "minimum_should_match": "1",
                "should": [                         //time cost: 148ms     -->212ms
                  {
                    "match": {
                      "text_terms": {               
                        "boost": 100000,
                        "query": "微信 动态 搞笑 图片 500张"
                      }
                    }
                  },
                  {
                    "match": {
                      "username_terms": {            
                        "query": "微 信 动 态 搞 笑 图 片 5 0 0 张 "
                      }
                    }
                  },
                  {
                    "match": {
                      "hashtags": {
                        "query": "微信动态搞笑图片500张"  //time cost:  very little, can be ignore
                      }
                    }
                  },
                  {
                    "match": {
                      "hashtags.keyword": {
                        "query": "微信动态搞笑图片500张"
                      }
                    }
                  }
                ]
              }
            }
          ],
          "must_not": [
            {
              "range": {
                "link_level": {               //161 ms      -> 33ms
                  "from": null,
                  "include_lower": true,
                  "include_upper": false,
                  "to": 5
                }
              }
            },
            {
              "terms": {
                "create_type": [            //0.2 ms  --> 0.6ms
                  100,
                  102,
                  103,
                  104
                ]
              }
            },
            {
              "term": {
                "media_type": 5           //time cost:  very little, can be ignore
              }
            }
          ]
        }
      },
      "score_mode": "multiply"
    }
  },
  "size": 3000
}

profile:
es5_profile
es 5 profile(cost 400+ms)
es7_profile

es 7 profile (cost 600+ms)

It didn't improve performance or you weren't able to change the mapping?
Media-type looks like it should be a keyword field (numbers like integer are optimised for range queries not exact-match).

we had tried it, but it's more slower than integer. especially this sub query

        "filter": {
            "terms": {
              "media_type": [            //time cost: 143ms     ->89ms
                3,
                5
              ]
            }
          },

this sub query will cost 300+ms, if it's integer type, it cost 140+ms only. if in es5, it 's cost 89ms.

this sub query change as follow will be better, if it was keyword type, it cost 160+ms

 "filter": {
        "bool": {
        "should":[
               { "term":{  "media_type": 3  } }, 
              { "term":{  "media_type": 5  } }
          ]
      }
}

but if status and user_status change to keyword, it will quick a little, "status: 3" cost 70+ms, "user_status: 100" cost 60+ms, but it's still slower than es5.

        "must": [
            {
              "term": {
                "status": 3                //time cost: 82ms     -->45ms
              }
            },
            {
              "term": {
                "user_status": 100         //time cost: 72ms       -->35ms
              }
            },

We're always looking for regressions in our nightly benchmarks and this is not one we've observed.

What's the cardinality of the fields you tested? I can't imagine there's that many different media types.

For the cardinality,
user_status: 4
status: 7
media_type: 6
and there are about 69, 000, 000 documents in the index

Thanks for that. Just ran my own terms search benchmarks here with longs vs keywords and 7.6.1 vs 5.6.0.

In all cases: 7.6.1 is better than 5.6.0 and keywords are better than longs.

"my_id": qTerms # V5.6.0 =3.3s V7.6.1 =1.2s

3ms in es5, 1.2ms in es7 per request ??

we try again, delete other sub query, only remain terms query. but it's cost between 100ms ~200ms in es5.

there are some difference with your benchmarks:

  1. the doc count of each media_type is 11,000,000. your's 200,000
  2. the type of media_type is integer, your's long

besides, can your compare 5.6 with 7.1? may be 7.1 and 7.6 have some difference.

thanks.

No, 3.3s as in 3.3 seconds to complete the 1,000 searches on random choices of value.
Timings taken after repeated runs (once file system cache has kicked in and response times for a test config stabilise).

I tried with integers and 7.1 is roughly similar to 7.6.1.
When it comes to 5.6 vs 7.x there's a big speed up in relation to track_total_hits defaulting to False (newer versions of elasticsearch accelerate matching if it knows you have no aggregations and only need approximate numbers of matches). I updated my test to turn this optimisation off for 7.x tests. Even with this flag set to "true" (the slower mode) 7.x is faster than 5.6 for this test.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.