Processing concentration on some cluster nodes

Claudio_Ranieri · August 20, 2018, 6:39pm

Hi, we currently have a cluster elasticsearch consisting of 18 data nodes and 3 master nodes (instances m4.4xlarge - 16 core and 64Gb RAM distributed in 2 zones in aws). We use elasticsearch version 5.6.2 with java jre1.8.0_162 64 bits (with -Xms30g -Xmx30g). The application connects to the cluster via transport client java (by data nodes). Heavy searches basically access 2 indexes. Each index has 12 shards and 2 replicas. One index has 37 million documents and 78Gb of data and the other index has 49 million documents and 230Gb of data.
In times of high load, we notice that from 3 to 4 nodes are with cpu close to 100% and the other nodes of the cluster with 65 to 70%. The total latency of the cluster goes up. The high processing in few nodes limit the total throughput of the cluster. Is there any reason for processing to stay focused on some nodes in the cluster? How could we better distribute the processing in the cluster?
We have already tried to use Coordinating Node to connect to the cluster, but the cpu concentration occurs in the same way.

Christian_Dahlqvist · August 20, 2018, 7:24pm

Are shards distributed evenly across the cluster? Are you using any features that can cause uneven load across an index, e.g. routing or parent-child? Are you performing a lot of scripted updates?

Claudio_Ranieri · August 20, 2018, 7:53pm

Hi Christian,

Are shards distributed evenly across the cluster?
Yes

Are you using any features that can cause uneven load across an index, e.g. routing or parent-child?
No

Are you performing a lot of scripted updates?
In our tests there is no indexing. We do not use script in the queries, only aggregations and function score

Christian_Dahlqvist · August 21, 2018, 5:52am

If you run the hot threads API on one of the busy nodes and compare it to one of the less busy ones, do you see any differences? Is it always the same nodes that are more busy? Do you see any differences in I/O performance between the nodes? Are requests evenly distributes across all nodes in the cluster?

Claudio_Ranieri · August 21, 2018, 2:59pm

If you run the hot threads API on one of the busy nodes and compare it to one of the less busy ones, do you see any differences?
I had looked, but I did not notice any relevant difference

Is it always the same nodes that are more busy?
In general they are the same machines, but it is not a rule

Do you see any differences in I/O performance between the nodes?
All nodes have the same hardware and configuration

Are requests evenly distributes across all nodes in the cluster?
Yes, but we do not know if this concentration can be any problem on the transpont client java (we only listing ip of data node) or internally within the elasticsearch. At times we have the feeling that elasticsearch prioritizes machines that only have replicas, but that is not a rule either. What we notice is that processing is concentrated on some nodes. How does elasticsearch decide which machines will be used for each request? Is there any prioritization depending on response time or the last machine that responded by given shard?

Claudio_Ranieri · August 28, 2018, 5:44pm

Anyone have any ideas about the problem?

system · September 25, 2018, 5:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Processing concentration on some cluster nodes - The return Elasticsearch	7	451	November 12, 2018
Two same nodes but different stats Elasticsearch	3	757	July 5, 2017
Load is not distributed across the nodes Elasticsearch	12	959	June 3, 2020
Uneven CPU Load Across Cluster 30 Node cluster Elasticsearch	1	688	August 9, 2017
One node in cluster is using (a lot) more heap space and cpu Elasticsearch	4	2420	July 5, 2017

Processing concentration on some cluster nodes

Related Topics