Specific node is working "harder" than the others on the cluster

Itzhak · January 22, 2017, 8:33am

Hi,

I have an Elasticsearch cluster (version 2.3.4) with a custom app pulling data and a couchbase cluster pushing data to it (once an hour). The cluster is based on Amazon EC2 machines with the same specs and settings. After few hours one of the nodes seems to work "harder" than the others, in the monitoring plugins (KOPF, Elastic HQ) I can see the load is constantly high and once every few days the number of "Field Evictions" is raising.

While I understand this is an indication for lack of memory (which leads to high IOPs and high cpu), I wish to know why only one (specific) node is showing these symptoms in the cluster and why the load isn't spreading. If I restart the cluster, another node will show these symptoms few days after until the cluster is restarted again.

Settings:
M4 Instance
16 GB ram
150 IOPS
4 Nodes

index.number_of_shards: 5
index.number_of_replicas: 2
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
indices.breaker.fielddata.limit: "60%"
ES_HEAP_SIZE=7g
MAX_OPEN_FILES=65536
MAX_LOCKED_MEMORY=unlimited

Elasticsearch version 2.3.4

Thanks for the help in advance.

warkolm · January 22, 2017, 8:45am

Impossible to say without more info.

Are all your apps using proper load balancing?

Itzhak · January 22, 2017, 3:59pm

Thank you for the quick reply, I forgot to mention the version of Elasticsearching we're using, 2.3.4 (added to the original post too).

Yes, the app is allowed to access any/all of the Elasticsearch instances, the instances aren't limited to a specific role (master,data storage, router).

After we restart the cluster (gradually), the load will "bounce" to another instance and will stay there until we restart the cluster again.

jprante · January 22, 2017, 4:19pm

You have a skew in your design: 4 nodes but 5 shards. So, one node must hold 2 shards, and must burden double load.

Golden rule: always align the shard count with the number of data nodes. Either by a 1:1 ratio, which is easiest, or in the case of many indices, by a 1:n ratio, so each data node will hold the same number of shards.

Beside that, I would strongly recommend to set up an odd number of master-eligible nodes to make the distributed system proof against split-brain situations. See the minimum_master_nodes setting .

system · February 19, 2017, 4:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Wrong load distribution Elasticsearch	4	441	July 6, 2017
All load is being concentrated on one node? Elasticsearch	17	4056	November 2, 2018
Unbalanced cluster nodes Elasticsearch	12	3385	November 20, 2018
Heavy load on one node (1 index) Elasticsearch	12	2260	July 6, 2017
High CPU Load on only some of the machines in a cluster Elasticsearch	14	2290	July 6, 2017

Specific node is working "harder" than the others on the cluster

Related topics