Aggregation query 5x faster when timezone is removed from query

It takes kibana 25 seconds to display the histogram at the top of the discover pane in our 320 million, single shard index. If I copy the request json into "Dev Tools" and remove the timezone part from the aggs query it takes 5 seconds.

  • 320 million docs, spread over ~8 hours, single shard
  • Running on CentOS (vmware), 8 cores, 64GB (30GB heap). ES 5.6.2.
  • The 25 second query consumes 100% cpu (from one core)

Is this expected behaviour or have we done something wrong?

 "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30m",
        "time_zone": "Europe/Berlin",   <-- Remove
        "min_doc_count": 1
      }
    }
  }

Regards /Johan

Did you try running the agg with timestamp in Dev Tools? E.g. Kibana does a lot of non-querying work related to visually displaying the data, so it'd be good to get a base speed for the query without the visualization overhead.

Secondly, you may be running into caching. ES (and the FS) cache various bits, so the second run may be a lot faster simply because you're hitting a cache. Running the with vs without several times will give a better indication of if caching is involved.

That said, can you post the full query/agg? It's hard to know if it's expected to be faster without the full context. The short answer is probably "Yes, expected", simply because you're asking the agg to do less work. But hard to say without the full context. There may be some other clauses in the query that allow it to short-circuit execution once the date_histo are gone.

Lastly, 320m docs in a single shard may be a bit much. You'll certainly get better latency if you split that index into multiple shards (even on a single node), simply because it allows more threads to do work at the same time.

Hi,

Thanks for your answer!

Yes, I have tested running the query both with and without timestamp in devtools, the duration that I have specified is from dev tools.

Regarding Cache, I have tried back and forth many times with the same result.

We have also tried the same thing locally on another machine with the same result so it is easy to reproduce. The problem get worse the more you squeeze into a small time interval. Getting 300.000.000 over a week is no problem but the same amount over a few hours is a huge problem.

Regarding shard size, this is not a production environment, we are doing some ingestion benchmarking and this was something I found by accident since I thought the kibana results took so super long to render and I was unable to find any info regarding the timezone conversion issue anywhere. It is by no means a blocker, and I will to do some more tests the upcoming weeks.

Kind regards /Johan

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.