Aggregation query 5x faster when timezone is removed from query

JohanRask · October 20, 2017, 7:58am

It takes kibana 25 seconds to display the histogram at the top of the discover pane in our 320 million, single shard index. If I copy the request json into "Dev Tools" and remove the timezone part from the aggs query it takes 5 seconds.

320 million docs, spread over ~8 hours, single shard
Running on CentOS (vmware), 8 cores, 64GB (30GB heap). ES 5.6.2.
The 25 second query consumes 100% cpu (from one core)

Is this expected behaviour or have we done something wrong?

 "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30m",
        "time_zone": "Europe/Berlin",   <-- Remove
        "min_doc_count": 1
      }
    }
  }

Regards /Johan

polyfractal · October 27, 2017, 3:27pm

Did you try running the agg with timestamp in Dev Tools? E.g. Kibana does a lot of non-querying work related to visually displaying the data, so it'd be good to get a base speed for the query without the visualization overhead.

Secondly, you may be running into caching. ES (and the FS) cache various bits, so the second run may be a lot faster simply because you're hitting a cache. Running the with vs without several times will give a better indication of if caching is involved.

That said, can you post the full query/agg? It's hard to know if it's expected to be faster without the full context. The short answer is probably "Yes, expected", simply because you're asking the agg to do less work. But hard to say without the full context. There may be some other clauses in the query that allow it to short-circuit execution once the date_histo are gone.

Lastly, 320m docs in a single shard may be a bit much. You'll certainly get better latency if you split that index into multiple shards (even on a single node), simply because it allows more threads to do work at the same time.

JohanRask · October 30, 2017, 12:21pm

Hi,

Thanks for your answer!

Yes, I have tested running the query both with and without timestamp in devtools, the duration that I have specified is from dev tools.

Regarding Cache, I have tried back and forth many times with the same result.

We have also tried the same thing locally on another machine with the same result so it is easy to reproduce. The problem get worse the more you squeeze into a small time interval. Getting 300.000.000 over a week is no problem but the same amount over a few hours is a huge problem.

Regarding shard size, this is not a production environment, we are doing some ingestion benchmarking and this was something I found by accident since I thought the kibana results took so super long to render and I was unable to find any info regarding the timezone conversion issue anywhere. It is by no means a blocker, and I will to do some more tests the upcoming weeks.

Kind regards /Johan

system · November 27, 2017, 12:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance when using `time_zone` in date_historgram Elasticsearch	3	628	December 19, 2017
Setting timezone on date_histogram slow down the query Elasticsearch	1	409	September 9, 2019
Timed out while getting index list on creating an index pattern in Kibana Elasticsearch	11	2620	April 4, 2019
Date histogram in Kibana is very slow Elasticsearch	3	1242	March 9, 2017
Discovery: very slow data visualisation Kibana	4	983	April 6, 2017

Aggregation query 5x faster when timezone is removed from query

Related topics