Slow Aggregation over Time based indices. Each index has around ~10M records

durgaharish1993 · September 16, 2019, 4:18am

I am using a monthly time based indexes ( Eg - prefix-01-2019, prefix-02-2019 .. so on ) for storing the time based data. Each index has ~10 Million documents with 13 fields in each document. I am trying to perform nested aggregation queries across multiple indexes. But the query time is ~10-15 Seconds, which is quite slow.

Here is the link of the query - https://gist.github.com/durgaharish1993/8af7256f02c44988b3f1e09ada370323

date_value - Field name of date
dimension1 - It is a text field with cardinality ~ 5000
dimension2 - It is a text field with cardinality = 6
metric1 - It is a floating point number

Please let me know if there are ways to improve the query time.

Elasticsearch cluster details
4 Data node cluster

Christian_Dahlqvist · September 16, 2019, 5:10am

That sounds very slow given the amount of data. Given that you are querying across 2 months worth of data I would expect only 2 monthly indices to be involved. If you have 1 primary shard per index (which seems appropriate given the number of documents per index) 2 shard will be queried to serve the request. As queries against each shard is single threaded it means that only 2 threads across your 4 nodes will be busy with the query. If you also have slow storage this could perhaps explain the slow response times.

durgaharish1993 · September 16, 2019, 5:46am

Hi Christian, Thanks for the reply. You are correct, I ran multiple tests based on 2 month of data - Just to correct, the response times are ~10s to 15s.
I am using 3 primary shards per index. Do you think reducing the number of primary shards to 1 will speed up the response time?.

Any kind of help would be great.

Christian_Dahlqvist · September 16, 2019, 5:50am

No, quite the opposite. I would expect that to give worse performance.

What type of hardware is your cluster deployed on? What kind of storage are you using? What is the load on the cluster when you are querying?

durgaharish1993 · September 16, 2019, 6:02am

Thanks for confirming . We are using 4 node cluster on digitalocean's Standard droplet with following configuration

32 GB RAM - 8CPU's X 1 node - Acts as master
8 GB RAM - 4 CPU's X 3 node - Acts as data nodes

Storage - We are using SSD Storage.
There is no other activity in the cluster when we were testing/running these queries. All the 4 nodes are dedicated to Elasticsearch job.

Christian_Dahlqvist · September 16, 2019, 6:08am

You should always look to have 3 master eligible nodes in a cluster as having a single master eligible node is very bad and can lead to data loss. As dedicated master nodes should not serve requests they can generally be smaller than the data nodes, which do all the hard work.

I am not familiar with digitalocean's hosts, but if you have networked SSD storage this can still be the bottleneck. I would recommend looking at disk utilization and iowait, e.g. using iostat or similar tools.

system · October 14, 2019, 6:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extremely slow nested aggregations, need suggestions on modeling/shards Elasticsearch	1	483	July 5, 2017
Aggregations on multiple indices much slower than on single index Elasticsearch	1	560	July 5, 2017
Aggregation performance Elasticsearch	6	337	July 16, 2021
Slow searches on a cluster Elasticsearch	3	878	July 5, 2017
Suggestion needed on Indexing Performance Elasticsearch	1	495	July 6, 2017

Slow Aggregation over Time based indices. Each index has around ~10M records

Related topics