Poor I/O on heavy aggregation

ebuildy · February 18, 2017, 11:01am

Not sure if elasticsearch is involved, but, when doing a big aggregation (100millions doc, unique count on big cardinality fields, we use doc values), this:

takes 1 minute
consumes 50% CPU
RAM is stable, heap size is stable (no pike at all)
I/O goes max to 30 MB/s

I am wondering if there is something that could limit performance? (we have SSD that should deliver 500MB/s, I am quite surprise).

Thanks you,

Christian_Dahlqvist · February 18, 2017, 11:58am

How many shards are you aggregating across? How many nodes do you have in the cluster?

Queries and aggregations are performed in a single thread per shard, but shards can be processed in parallel. If you therefore have a low number of shards per node that you are aggregating across and issue a single query, you can be limited by CPU even though it is not maxed out on the node.

ebuildy · February 18, 2017, 1:02pm

Ha thanks you, didnt know that (1 thread per shard).

I have a 3 data nodes cluster (and 2 LB nodes), of 7 indices with 3 shards each = 21 shards total.

What metrics should I look at?

Thanks you,

Christian_Dahlqvist · February 18, 2017, 1:04pm

Is that 21 primary shards? How many CPU cores does each node have?

I would also recommend looking at iostat and see what the disk utilisation an await looks like. Read throughput on its own may not be the best measure as the load typically consists of a lot of random access reads.

ebuildy · February 18, 2017, 1:44pm

Ya 21 primary shards, 1 replica = 42 shards total.

We have SSD, so I am surprised to see maximum I/O read at 30MB/s:

Zabbix screenshot => http://b3.ms/Oq5nRGK5M0wB

As we use VMWare, I suspect a bad configuration from our sys-admin, should be that possible?

Thanks you,

Christian_Dahlqvist · February 18, 2017, 7:43pm

You may want to verify that the VMs have access to the resources to have assigned and are not over-provisioned with respect to CPU or RAM.

system · March 18, 2017, 7:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index Dimensioning and Optimization (across the Cluster) Elasticsearch	6	376	March 24, 2021
Shards per node, Heap, 100% CPU....help please Elasticsearch	5	401	August 18, 2021
Max Shard Size: 50GB versus 100GB? Elasticsearch	4	4368	April 29, 2017
Poor performance and a lot of GC overhead Elasticsearch	19	13576	September 7, 2018
Performance for 5 node cluster Elasticsearch	8	1388	July 5, 2017

Poor I/O on heavy aggregation

Related topics