Slow date_histogram after upgrading to 7.3.0 on "dense" indexes

Hi there,

We noticed a serious performance degradation of date_histogram aggregations on our cluster after migrating from ES 6.8.2 to 7.3.0. Queries are at least 2 times slower with 7.3.0.

We managed to reproduce the issue in a test environment and compared results between the two versions. For this test we made use of 4 distinct datasets, each made of 70m documents containing a single @timestamp field:

  • gaussian-sameday: this dataset represents the actual distribution of log data during a production day. It is a gaussian distribution centered around lunch time (more documents during the day than the night). All documents fit within the same day.
  • uniform-sameday: All documents fit within the same day but are evenly distributed (same amount of docs every hours).
  • uniform-1s: Documents are spaced a second apart (the first starts at 2000-01-01T00:00:00.000Z, next is 1 second later).
  • uniform-10s: 10 second gap between documents.

Each dataset is loaded in its own index configured with a single shard and no replica. The test cluster is made of a single ES node with the default configuration as shipped by Elastic.

We ran the following query against the 4 datasets with query caching disabled:

{
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1d",
        "min_doc_count": 1
      }
    }
  },
  "query": {
    "match_all": {}
  }
}

The table below shows the timing in millis:

dataset             6.8.2     7.3.0
--------------------------------------
gaussian-sameday    6520      11450
uniform-sameday     6280      12013
uniform-1s         34836      16199
uniform-10s        34211      16934

As you noticed, 6.8.2 performs twice faster than 7.3.0 when dates are close to each other (sameday datasets). However, 7.3.0 performs better when they are spread over a larger time range.

We ran the same queries with different bucket sizes and concluded it has no or little impact as we always observed the same differences between the two versions.

Did someone experienced the same behaviour?
Is this something we can tune?
Is this a regression?

Thanks for your advices,
/Bertrand

Note: we could not reproduce the issue with the nyc_taxis dataset used by Rally because dates in that dataset are not close enough to each other.


Update: removing time_zone parameter from the date_histogram since it is not required by the scenario. Timings are obviously different now but differences between the two ES versions are still the same.

Any feedback from Elastic members on this ?
For the info, we got same results with 7.3.1.

Hey,

can you open a github issue for this for further investigation, please?

--Alex

@spinscale Maybe you could reopen https://github.com/elastic/elasticsearch/issues/45702 or do you want me to create a new one by copying this forum post ?

I reopened. If you have any possibilities to share your dataset or any further information to put on that issue, that would be greatly appreciated!

Datasets are all about 50m bzip2 compressed files. Where should I put them? In a temporary public github repo?

maybe upload to gdrive and then provide a link? If you cannot share publicly, but are willing to share privately, you can email me a private link and I'm happy to forward that internally. My email is $firstname.$lastname@elastic.co_without_an_m_at_the_end

--Alex

The four datasets mentioned in my test cases are available at https://drive.google.com/open?id=1ZIQjh00zOFyLoLY8vTEo2Tm2T0ZJ4AnD

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.