Processing concentration on some cluster nodes - The return

Claudio_Ranieri · October 15, 2018, 4:57pm

Hi,
I had opened the topic Processing concentration on some cluster nodes about the concentration of processing on a few nodes of the elasticsearch cluster. We performed several tests and noticed the following behavior: when we use only 1 replica in the cluster, the concentration of processing in few nodes of the cluster does not occur. The concentration only occurs with the use of 2 replicas. As commented in the previous topic, we have 2 main indexes, with 12 shards each index and 18 data nodes. We did tests accessing the cluster directly by rest, because we suspected that it was some problem with the transport client, but by rest, the problem with concentration in a few nodes too occurs. Can there be a bug with the use of 2 replicas? (other people should usually use only 1 replica)

DavidTurner · October 15, 2018, 6:42pm

The situation described in your earlier thread sounds like a good candidate for adaptive replica selection (see also the blog post about how it works). This was added in 6.1.0 but you were on 5.6.2 when you last asked. Can you upgrade?

Claudio_Ranieri · October 15, 2018, 7:21pm

Hi David,

We are currently unable to upgrade elastisearch to version 6.x because our application is heavily based on Transport Client (api java) and we use types. With only 1 replica we have the whole cluster balanced. Why does the cluster get unbalanced when we use 2 replicas?

Christian_Dahlqvist · October 15, 2018, 7:24pm

If you go from 1 to 2 replicas you are increasing the amount of data stored on disk by 50%. Could it be that this leads to increased disk I/O, which results in higher load?

Claudio_Ranieri · October 15, 2018, 7:32pm

The disks are ssd. I think the influence would be small, especially in cpu consumption. We have machines with 64Gb of RAM and the jvm using 30Gb (near 32Gb for mmap)

DavidTurner · October 15, 2018, 9:22pm

This is certainly a puzzle. I wonder if perhaps searching one of your indices is more expensive than the other one, and the allocation of the shards is such that the expensive shards are more concentrated on the few problematic nodes. Elasticsearch balances the cluster based on shard count, considering all shards as basically equal, so this is possible.

I also wonder whether the busier nodes are busier simply because they see more searches (and related activity) because of some kind of routing oddity. You can find this out by looking at the nodes statistics:

GET /_nodes/stats?filter_path=nodes.*.indices.search

This shows cumulative statistics about each node, so you need to look at two consecutive outputs some time apart and take deltas. Is there any significant difference here between the busier nodes and the quieter ones?

DavidTurner · October 15, 2018, 9:29pm

One effect of increasing the data on each node is that it's easier to run out of filesystem cache, which certainly can affect performance. The balanced performance with 1 replica might be a distraction - it's puzzling that there are these hotspots with any number of replicas.

system · November 12, 2018, 9:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unbalanced cluster with nearly half of the shards allocated to a single node Elasticsearch	5	1990	July 5, 2017
One node in cluster is using (a lot) more heap space and cpu Elasticsearch	4	2420	July 5, 2017
Shards per node, Heap, 100% CPU....help please Elasticsearch	5	396	August 18, 2021
Different number of nodes/replicas/shards doesnt change performance Elasticsearch	10	727	July 5, 2017
Very bad shard allocation Elasticsearch	3	329	May 18, 2020

Processing concentration on some cluster nodes - The return

Related Topics