Unbalanced cluster nodes

elk2 · October 17, 2018, 2:55pm

From some days I have this situation for my cluster:

[root@elk ~]# curl -XGET 'localhost:9200/_cat/nodes?v'
host ip heap.percent ram.percent load node.role master name
192.168.0.51 192.168.0.51 17 93 25.16 d m clusternode1
192.168.0.58 192.168.0.58 39 99 0.51 d - clusternode8
192.168.0.59 192.168.0.59 15 96 0.00 - m clusternode9
192.168.0.56 192.168.0.56 38 93 0.52 d - clusternode6
192.168.0.54 192.168.0.54 42 96 0.10 d - clusternode4
192.168.0.52 192.168.0.52 27 84 0.26 d m clusternode2
192.168.0.82 192.168.0.82 16 92 0.12 - - balancernode
192.168.0.57 192.168.0.57 29 96 0.04 - * clusternode7
192.168.0.53 192.168.0.53 64 100 0.03 d - clusternode3
192.168.0.55 192.168.0.55 35 30 0.27 d - clusternode5

Clusternode1 is overloaded and the others seem don't work.

elk2 · October 18, 2018, 8:36am

Any help?

Christian_Dahlqvist · October 18, 2018, 11:30am

You have not provided a lot of information to go on, so I am not surprised that no-one have been able to help. I would recommend answering the following questions:

Which version of Elasticsearch are you using?
What is the specification of your nodes/cluster?
What is your use case?
Is there anything in the Elasticsearch logs on the node that is highly loaded?
is there anything, e.g. with respect to shard distribution, data volume or configuration, which sets this node apart from the other?

elk2 · October 18, 2018, 2:03pm

Thanks Christian,

Which version of Elasticsearch are you using? 2.4.2
What is the specification of your nodes/cluster? All nodes are VM with intel core Xeon and 8GB Ram and 1TB hd each one.
What is your use case? I use elastic to analyze syslog and search alert
Is there anything in the Elasticsearch logs on the node that is highly loaded? I don't know. Can you explain me how find it?
is there anything, e.g. with respect to shard distribution, data volume or configuration, which sets this node apart from the other? No. All cluster nodes have identical settings.

My cluster is active from many years. It is the first time that a single node is overloaded.

Christian_Dahlqvist · October 18, 2018, 2:57pm

Look at the Elasticsearch logs for anything unusual, e.g. errors, warning or long/frequent GC. The location depends on how you installed it.

Also look at the hot threads API to see what the node is doing.

elk2 · October 19, 2018, 12:29pm

With command

curl 'localhost:9200/_nodes/hot_threads'

This is the result:

::: {clusternode1}{Q0cNwhy0Q0yy8Ba6yZoVHA}{192.168.0.11}{192.168.0.11:9500}{zone=A, master=false}
Hot threads at 2018-10-19T11:44:47.084Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

95.1% (475.6ms out of 500ms) cpu usage by thread 'elasticsearch[clusternode1][search][T#25]'
10/10 snapshots sharing following 23 elements
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:162)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:201)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:159)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:159)
org.elasticsearch.search.aggregations.AggregationPhase.execute(AggregationPhase.java:167)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:119)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:372)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:385)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:293)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

94.3% (471.4ms out of 500ms) cpu usage by thread 'elasticsearch[clusternode1][search][T#1]'
10/10 snapshots sharing following 23 elements
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:162)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:201)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:159)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:159)
org.elasticsearch.search.aggregations.AggregationPhase.execute(AggregationPhase.java:167)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:119)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:372)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:385)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:293)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

How can I have to read this results?

elk2 · October 19, 2018, 2:34pm

I see some shards relocation from node overloaded but it takes long time to do.
Hot thread are referred also to relocation shards?

Christian_Dahlqvist · October 19, 2018, 2:45pm

I am not an expert in reading the finer details of the hot threads output, but it looks like the node is busy building global ordinals for an aggregation query. Are you using custom routing or parent-child relationships that could cause such uneven load?

elk2 · October 19, 2018, 3:23pm

No , any custom routing or parent-child relationships.

elk2 · October 22, 2018, 10:08am

I think the problem is that all shards of an index are in the same node. So when I try a query on that index the node becomes overloaded.
There is a mean to divide on more nodes the shards?

Christian_Dahlqvist · October 22, 2018, 10:12am

You could use the cluster reroute API to move shards around. Another option might be to use the total shards per node settings to force a spread across the cluster.

elk2 · October 23, 2018, 12:36pm

Thanks ! I solved. I modified template index with "total_shards_per_node":"2".
Now the node it seems balanced with others.

system · November 20, 2018, 12:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Heavy load on one node (1 index) Elasticsearch	12	2260	July 6, 2017
All load is being concentrated on one node? Elasticsearch	17	4057	November 2, 2018
Specific node is working "harder" than the others on the cluster Elasticsearch	4	798	February 19, 2017
High CPU Load on only some of the machines in a cluster Elasticsearch	14	2290	July 6, 2017
Elastic Search Random Node High Load Elasticsearch	3	471	July 6, 2017

Unbalanced cluster nodes

Related topics