7.2.0 spend more time on query `next_doc` and aggregation `collect` than 6.8.0

hackerwin7 · August 23, 2019, 10:12am

Recently，we upgrade our es cluster to 7.2.0 and find the same search took time is longer than 6.8.0
.
so we prepare for two cluster (7.2 and 6.8) with same index data set , same mapping, and execute same search. we found that 7.2.0 is slower than 6.8.0,
our tests below:
7.2.0 index list:
6.8.0 index list :

doc count is same across two clusters

profile the same search:

gist.github.com

https://gist.github.com/hackerwin7/52912518fcc83cdb51cdce6c04599b0c

680_profile

{
  "took" : 6471,
  "timed_out" : false,
  "_shards" : {
    "total" : 24,
    "successful" : 24,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {

This file has been truncated. show original

720_profile

{
  "took" : 7334,
  "timed_out" : false,
  "_shards" : {
    "total" : 24,
    "successful" : 24,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {

This file has been truncated. show original

After reviewing these profile results, I find that 7.2.0 took more time on advance in query and collect in aggregation

Is the Lucene upgrade cause this performance drop?
review the aggregation profile section, some phase of 7.x is extremely slow:

a shard aggregation profile instance
| version | collect  | collect_count | build_aggregation | build_aggregation_count |
|---------|----------|---------------|-------------------|-------------------------|
| 7.2     | 63333540 | 632532        | 89351401          | 4                       |
| 6.8     | 40441073 | 632532        | 52753450          | 4                       |

Bertrand · August 23, 2019, 10:44am

Could this be similar to Slow date_histogram after upgrading to 7.3.0 on "dense" indexes ?

hackerwin7 · August 23, 2019, 12:22pm

@Bertrand

Maybe, I'm not sure the root reason is? Did you compare your advance and advance_count between two versions?

Bertrand · August 23, 2019, 2:28pm

What do you mean by advance and advance_count?

hackerwin7 · August 25, 2019, 2:58am

@Bertrand

see this

hackerwin7 · August 26, 2019, 3:15am

Any benchmark reports for these versions?

Bertrand · August 26, 2019, 12:24pm

I made an extra run with the "uniform-sameday" dataset and query profiling enabled.
To recap:

70m docs, each with a single @timestamp field - dates are all in the same day, evenly distributed on 24h
3 nodes, 3 shards, no replica

Results for both 6.8.2 and 7.3.0 are as follows:

        timing           advance               advance_count
                 shard1, shard2, shard3    shard1, shard2, shard3
--------------------------------------------------------------------
6.8.2     6705        0,      0,      0         0,      0,      0
7.3.0    10130    26094,  14987,  14778        23,     18,     16

I have no idea what advance and advance_count represent - but values are not the same for both versions.

Complete query profile results are available at:

The query was:

{
  "profile": true,
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1d",
        "min_doc_count": 1
      }
    }
  },
  "query": {
    "match_all": {}
  }
}

hackerwin7 · August 27, 2019, 6:12am

@Bertrand

Similar to my case, but I want to know why 7.x will have more advance cost than 6.8, is it for more accurate scorer for Lucene docs?

hackerwin7 · August 28, 2019, 8:22am

@Bertrand

I remove the date_histogram in my search, just use terms aggregation for both two clusters; the 7.2 search took time is still longer than 6.8 search took

ted_ye · September 10, 2019, 9:05am

@hackerwin7
hello, have you got the answer of this problem?
I've met the same problem, the same machine, the same index, different version of elasticsearch, 7.3.1 vs 6.2.4 ,when I search the logs on kibana with the same query request, It takes longer on 7.3.1 than 6.2.4.

hackerwin7 · September 10, 2019, 10:26am

@ted_ye

I still have no idea for this, I think the problem is in Aggregator.getCollector().collect() function, the 7.x spend more time in collect() in Lucene query phase

dadoonet · September 11, 2019, 6:48am

@jimczi any idea?

jimczi · September 12, 2019, 12:27pm

This is related to the migration of Joda to Java time. We're investigating the slowdown (that we are now able to reproduce easily) in https://github.com/elastic/elasticsearch/issues/45702#issuecomment-530756419 and we have an idea on how to restore the performance of 6.8. Stay tuned

hackerwin7 · September 12, 2019, 1:46pm

@jimczi
nice catch for this! interested to follow the issue

Topic		Replies	Views
Aggregations in 2.1.0 much slower than 1.6.0 Elasticsearch	33	4570	January 19, 2016
Aggregations after upgrading to ES 7 - request slowed down Elasticsearch	4	503	March 30, 2020
Aggregations slower in ElasticSearch 6.3.0 Elasticsearch	0	402	July 18, 2018
Upgrade to 7.x aggregation performance degradation Elasticsearch	1	276	March 14, 2023
Elastic Search Aggregations Slow Elasticsearch	20	3040	October 29, 2021

7.2.0 spend more time on query `next_doc` and aggregation `collect` than 6.8.0

Related topics