The queries for numeric fields are slower after upgraded the cluster from 2.4.5 to 5.6.3

Cluster info

PS: two cluster with same nodes,docs
Elasticsearch version : 2.4.5/5.6.3
JVM version : java8
OS version: Linux CentOS 6
Nodes: 32(data/master)

Problem

The queries for numeric fields are slower after upgrading the cluster from 2.4.5 to 5.6.3. The avg and tp99 response time of 5.x cluster increase almost twice as 2.x.

  • Query
    The field xxx_id is numeric. The query contains 500~1000 random xxx_id.
    { "from": 0, "size": 1000, "timeout": "5000ms", "query": { "query_string": { "query": "xxx_id:(3976321 2681125 3395902 565629 1473422... )" } } }

  • Result of 5.x

    • search result:
      { "took": 1417, "timed_out": false, "_shards": { "total": 4, "successful": 4, "skipped": 0, "failed": 0 }, "hits": { "total": 75350, "max_score": 0, "hits": [...] } }
    • profile result:
      { "took": 1417, "timed_out": false, "_shards": { "total": 4, "successful": 4, "skipped": 0, "failed": 0 }, "hits": {...}, "profile": { "shards": [ { "id": "[xxxx][indexxxxx][2]", "searches": [ { "query": [ { "type": "BooleanQuery", "description": "xxx_id:[3976321 TO 3976321] xxx_id:[2681125 TO 2681125] xxx_id:[3395902 TO 3395902] xxx_id:[565629 TO 565629] xxx_id:[1473422 TO 1473422] xxx_id:[1724922 TO 1724922] xxx_id:[1450020 TO 1450020] xxx_id:[1517222 TO 1517222] xxx_id:[1219222 TO 1219222] xxx_id:[3343724 TO 3343724] xxx_id:[1459926 TO 1459926] xxx_id:[2621525 TO 2621525] xxx_id:[3564826 TO 3564826] xxx_id:[2518929 TO 2518929] xxx_id:[3763028 TO 3763028] xxx_id:[2601623 TO 2601623] xxx_id:[3964521 TO 3964521] xxx_id:[59028 TO 59028] xxx_id:[3697803 TO 3697803] xxx_id:[2287327 TO 2287327] xxx_id:[4192624 TO 4192624] xxx_id:[2455921 TO 2455921] xxx_id:[1652523 TO 1652523] xxx_id:[2548624 TO 2548624] xxx_id:[3450626 TO 3450626] xxx_id:[3447323 TO 3447323] ...", "time": "818.2819640ms", "time_in_nanos": 818281964, "breakdown": { "score": 12800967, "build_scorer_count": 42, "match_count": 0, "create_weight": 655340, "next_doc": 14473065, "match": 0, "create_weight_count": 1, "next_doc_count": 19101, "score_count": 18676, "build_scorer": 765710659, "advance": 24604092, "advance_count": 21 }, "children": [ { "type": "", "description": "xxx_id:[3976321 TO 3976321]", "time": "0.8574110000ms", "time_in_nanos": 857411, "breakdown": { "score": 2363, "build_scorer_count": 42, "match_count": 0, "create_weight": 418, "next_doc": 2590, "match": 0, "create_weight_count": 1, "next_doc_count": 11, "score_count": 11, "build_scorer": 824560, "advance": 27394, "advance_count": 21 } } ... ] } ], "rewrite_time": 63431, "collector": [ { "name": "CancellableCollector", "reason": "search_cancelled", "time": "22.17472100ms", "time_in_nanos": 22174721, "children": [ { "name": "SimpleTopScoreDocCollector", "reason": "search_top_hits", "time": "14.49816800ms", "time_in_nanos": 14498168 } ] } ] } ], "aggregations": [] } ... ] } }
    • cpu analyze result
  • Result of 2.x
    { "took": 688, "timed_out": false, "_shards": { "total": 4, "successful": 4, "failed": 0 }, "hits": { "total": 75350, "max_score": 0, "hits": [...] } }

Solution?

Is it due to the changing of numeric data-structure in 5.0? Can I reindex the field as keyword to solve?

Basically, yes.

In ES 5.x, numerics use a new datastructure (BKD tree). This allows better compression, faster numeric operations and lower memory usage... but it is not ideal for "point lookups" like a term query. E.g. it is designed for numeric style operations like ranges, but not single value lookups.

If that field is only used for exact-match lookups, you can re-index it as a keyword. Keyword fields are optimized for exact-match lookups and will be a lot faster.

More info: https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.html#_map_identifiers_as_literal_keyword_literal

To dive a bit more into technicals, the BKD datastructure doesn't support sorted iteration, so it has to collect all matches, sort the array and then return an iterator to that sorted array (paraphrasing). That process happens during the build_scorer step. This process isn't bad when dealing with numeric ranges since the cost is amortized over all the values that are being iterated over, but can get expensive when asking for a bunch of individual points.

2.x didn't have BKD trees, hence the difference in performance.

Is there any special encoding done to the terms when it sees that they're all numbers ? Kinda like it's done with _id in 6+.

Thank you very much!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.