Dramatic performance degradation on range queries after upgrade ES from 2.4 to 7.1

Hi,

I have an elasticsearch cluster of version 2.4 (lucene version 5.5.4). Near I installed a new cluster of version 7.1 (lucene version 8.1.0) and indexed all data (~2.7 billion documents). Both of cluster use Java 1.8.

But I noticed the significant increase of latency for queries with range clauses:

...
 {
      "range": {
        "metadata.sums": {
          "from": 1127039.0,
          "to": 1127040.0,
          "include_lower": true,
          "include_upper": false,
          "boost": 1.0
        }
      }
...

The old cluster executes that queries 304 msec but for the new cluster the same query takes it 20 sec.

I can't understand what's the matter. What's changed? How to fix the problem?

Thanks

Could you share the mapping for that field?

of course:

...
"metadata": {
    "properties": {
        ...,
        "sums": {
            "type": "double"
        }
    }
},
...

That should perform ok, would it be possible to run the query using the profile API?. Hopefully that will tell us what the query is doing.

That's the heaviest part (in fact, I have 4 that parts):

{
   "type":"IndexOrDocValuesQuery",
   "description":"metadata.sums:[1128539.0 TO 1128539.9999999998]",
   "time_in_nanos":3903830947,
   "breakdown":{
      "set_min_competitive_score_count":0,
      "match_count":0,
      "shallow_advance_count":45,
      "set_min_competitive_score":0,
      "next_doc":0,
      "match":0,
      "next_doc_count":0,
      "score_count":0,
      "compute_max_score_count":45,
      "compute_max_score":6769,
      "advance":3903093539,
      "advance_count":15,
      "score":0,
      "build_scorer_count":75,
      "create_weight":6020,
      "shallow_advance":9449,
      "create_weight_count":1,
      "build_scorer":714989
   }
}

It takes 3.9 sec.
I am wondering what means advance..

IndexOrDocValues is a type of query that depending on the execution it decides if the condition is resolved using the index or using docValues (brute force). Imagine you have a query with two filters A and B. If A only matches a handful of documents, it might be expensive to resolve B using the index, instead we can decide to check those few documents directly using the existing docValues (brute force approach).

I am speculating with one idea but I am afraid I need more information. I am wondering if the field metadata.sums is actually holding by arrays of doubles? In addition what is the topology go the index, number of shards? number documents per shard?

First about A and B filters. Actually the count of such filters (should clause) is not limited (although with only such condition a query is executed for 5+ sec).

Then, about configuration. There are 2.7 billion documents in each cluster.
Old cluster: 16 shards (from 121815021 to 398345655 doc per shard)
New cluster: 32 shards (from 40839388 to 178946503 doc per shard).

The field metadata.sums holds sums of money - yes, it is array

Here you can read what has changed in Elasticsearch regarding numeric fields:

I am a bit puzzle that even with only one range condition the query still takes 5+ seconds. Could you run it with only that condition and with the profiler to see where it is spending most of the time?

Hm...
If I execute a such query:

{
   "size":100,
   "profile":true,
   "query":{
      "bool":{
         "should":[
            {
               "range":{
                  "metadata.sums":{
                     "from":1128539.0,
                     "to":1128540.0,
                     "include_lower":true,
                     "include_upper":false,
                     "boost":1.0
                  }
               }
            }
         ],
         "minimum_should_match":"1",
         "boost":1.0
      }
   }
}

then I'll get response very quickly:

{
   "took":3,
   "timed_out":false,
   "_shards":{
      "total":1,
      "successful":1,
      "skipped":0,
      "failed":0
   },
   "hits":{
      "total":{
         "value":0,
         "relation":"eq"
      },
      "max_score":null,
      "hits":[

      ]
   },
   "profile":{
      "shards":[
         {
            "id":"[7k_HXKpPQBiijcQNPpiUCg][documents-v3][4]",
            "searches":[
               {
                  "query":[
                     {
                        "type":"IndexOrDocValuesQuery",
                        "description":"metadata.sums:[1128539.0 TO 1128539.9999999998]",
                        "time_in_nanos":1961299,
                        "breakdown":{
                           "set_min_competitive_score_count":0,
                           "match_count":0,
                           "shallow_advance_count":0,
                           "set_min_competitive_score":0,
                           "next_doc":0,
                           "match":0,
                           "next_doc_count":0,
                           "score_count":0,
                           "compute_max_score_count":0,
                           "compute_max_score":0,
                           "advance":6050,
                           "advance_count":41,
                           "score":0,
                           "build_scorer_count":84,
                           "create_weight":1187,
                           "shallow_advance":0,
                           "create_weight_count":1,
                           "build_scorer":1953936
                        }
                     }
                  ],
                  "rewrite_time":50759,
                  "collector":[
                     {
                        "name":"CancellableCollector",
                        "reason":"search_cancelled",
                        "time_in_nanos":18202,
                        "children":[
                           {
                              "name":"SimpleTopScoreDocCollector",
                              "reason":"search_top_hits",
                              "time_in_nanos":7412
                           }
                        ]
                     }
                  ]
               }
            ],
            "aggregations":[

            ]
         }
      ]
   }
}

IndexOrDocValuesQuery takes 1.9 msec and cancelled

The interesting thing is that probably I found condition without that the query works quickly.
The index is documents of many organizations. Each organization has own boxId parameter and every document has boxId field. The first condition of any query to the index is:

{
   "filter":[
      {
         "term":{
            "id.boxId":{
               "value":"{some boxId}",
               "boost":1.0
            }
         },
         ...
      }
   ]
}

If I remove this condition, query is very fast.
But why the old cluster works fine?

That actually makes sense and I think it confirms what I was thinking. The problem is that in newer versions of ES we use this IndexOrDocValues from Lucene which decides to execute the query either using the index or docValues, depending in some heuristics.

In your case, you probably are storing quite big arrays in the sums field which is screwing up those heuristics and we are taking a bad decision on how to execute the query.

One way to overcome this is to index that field without docValues. The side effect is that you cannot use that field for sorting or aggregations.

Meanwhile I am opening an issue in Lucene to improve such heuristics.

FYI: https://issues.apache.org/jira/browse/LUCENE-8990

Wow!
I am not sure about quite big arrays but not using docValues on that field — that works for me, because I don't need to do sorting or aggregations on that filed.

I'll try. I don't know how much it'll takes (I need to add a storage). And subscribed to the ticket.

Thank you!

Please let us know the outcome. Thanks for reporting!

@einfoman I think that the bug that Ignacio found is real, but I'm also surprised it took so long to only advance the doc-value iterator 15 times. Are you on an especially busy/slow disk? If you still have that cluster and if that very slow response time is reproducible, I'd be curious if you could run the query in a tight loop and collect nodes hot threads concurrently to try to understand what the bottleneck of that query is.

@Ignacio_Vera I reindex data without doc_values. It worked! Thank you for this workaround

1 Like

@jpountz Disks are SSD. I wouldn't say that its are busy.

The old index (with doc_values) is still alive. Now is the night and workload is very small. I executed the query and during it run _nodes/hot_threads. Here is result (I got it every time during the query execution): https://gist.github.com/einfoman/77081cca975f180c717e3ac1b7c4a239

I don't know if it is)

Thanks @einfoman, this answer my question. Your query is also suffering from the fact that disjunctions don't work well with two-phase iterators, which are created by phrase queries, script queries and sometimes range queries. Ignacio's suggested workaround happens to address this issue as well.

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.