7.2.0 spend more time on query `next_doc` and aggregation `collect` than 6.8.0

Recently,we upgrade our es cluster to 7.2.0 and find the same search took time is longer than 6.8.0
.
so we prepare for two cluster (7.2 and 6.8) with same index data set , same mapping, and execute same search. we found that 7.2.0 is slower than 6.8.0,
our tests below:
7.2.0 index list:
6.8.0 index list :

doc count is same across two clusters

profile the same search:

After reviewing these profile results, I find that 7.2.0 took more time on advance in query and collect in aggregation

Is the Lucene upgrade cause this performance drop?
review the aggregation profile section, some phase of 7.x is extremely slow:

a shard aggregation profile instance
| version | collect  | collect_count | build_aggregation | build_aggregation_count |
|---------|----------|---------------|-------------------|-------------------------|
| 7.2     | 63333540 | 632532        | 89351401          | 4                       |
| 6.8     | 40441073 | 632532        | 52753450          | 4                       |

Could this be similar to Slow date_histogram after upgrading to 7.3.0 on "dense" indexes ?

@Bertrand

Maybe, I'm not sure the root reason is? Did you compare your advance and advance_count between two versions?

What do you mean by advance and advance_count?

@Bertrand

see this

Any benchmark reports for these versions?

I made an extra run with the "uniform-sameday" dataset and query profiling enabled.
To recap:

  • 70m docs, each with a single @timestamp field - dates are all in the same day, evenly distributed on 24h
  • 3 nodes, 3 shards, no replica

Results for both 6.8.2 and 7.3.0 are as follows:

        timing           advance               advance_count
                 shard1, shard2, shard3    shard1, shard2, shard3
--------------------------------------------------------------------
6.8.2     6705        0,      0,      0         0,      0,      0
7.3.0    10130    26094,  14987,  14778        23,     18,     16

I have no idea what advance and advance_count represent - but values are not the same for both versions.

Complete query profile results are available at:

The query was:

{
  "profile": true,
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1d",
        "min_doc_count": 1
      }
    }
  },
  "query": {
    "match_all": {}
  }
}

@Bertrand

Similar to my case, but I want to know why 7.x will have more advance cost than 6.8, is it for more accurate scorer for Lucene docs?

@Bertrand

I remove the date_histogram in my search, just use terms aggregation for both two clusters; the 7.2 search took time is still longer than 6.8 search took

@hackerwin7
hello, have you got the answer of this problem?
I've met the same problem, the same machine, the same index, different version of elasticsearch, 7.3.1 vs 6.2.4 ,when I search the logs on kibana with the same query request, It takes longer on 7.3.1 than 6.2.4.

@ted_ye

I still have no idea for this, I think the problem is in Aggregator.getCollector().collect() function, the 7.x spend more time in collect() in Lucene query phase

@jimczi any idea?

This is related to the migration of Joda to Java time. We're investigating the slowdown (that we are now able to reproduce easily) in https://github.com/elastic/elasticsearch/issues/45702#issuecomment-530756419 and we have an idea on how to restore the performance of 6.8. Stay tuned :wink:

@jimczi
nice catch for this! interested to follow the issue