Hi there,
We noticed a serious performance degradation of date_histogram
aggregations on our cluster after migrating from ES 6.8.2
to 7.3.0
. Queries are at least 2 times slower with 7.3.0
.
We managed to reproduce the issue in a test environment and compared results between the two versions. For this test we made use of 4 distinct datasets, each made of 70m
documents containing a single @timestamp
field:
-
gaussian-sameday
: this dataset represents the actual distribution of log data during a production day. It is a gaussian distribution centered around lunch time (more documents during the day than the night). All documents fit within the same day. -
uniform-sameday
: All documents fit within the same day but are evenly distributed (same amount of docs every hours). -
uniform-1s
: Documents are spaced a second apart (the first starts at2000-01-01T00:00:00.000Z
, next is 1 second later). -
uniform-10s
: 10 second gap between documents.
Each dataset is loaded in its own index configured with a single shard and no replica. The test cluster is made of a single ES node with the default configuration as shipped by Elastic.
We ran the following query against the 4 datasets with query caching disabled:
{
"aggs": {
"2": {
"date_histogram": {
"field": "@timestamp",
"interval": "1d",
"min_doc_count": 1
}
}
},
"query": {
"match_all": {}
}
}
The table below shows the timing in millis:
dataset 6.8.2 7.3.0
--------------------------------------
gaussian-sameday 6520 11450
uniform-sameday 6280 12013
uniform-1s 34836 16199
uniform-10s 34211 16934
As you noticed, 6.8.2
performs twice faster than 7.3.0
when dates are close to each other (sameday datasets). However, 7.3.0
performs better when they are spread over a larger time range.
We ran the same queries with different bucket sizes and concluded it has no or little impact as we always observed the same differences between the two versions.
Did someone experienced the same behaviour?
Is this something we can tune?
Is this a regression?
Thanks for your advices,
/Bertrand
Note: we could not reproduce the issue with the nyc_taxis
dataset used by Rally because dates in that dataset are not close enough to each other.
Update: removing time_zone
parameter from the date_histogram
since it is not required by the scenario. Timings are obviously different now but differences between the two ES versions are still the same.