Unbalanced cluster nodes


(piter) #1

From some days I have this situation for my cluster:

[root@elk ~]# curl -XGET 'localhost:9200/_cat/nodes?v'
host ip heap.percent ram.percent load node.role master name
192.168.0.51 192.168.0.51 17 93 25.16 d m clusternode1
192.168.0.58 192.168.0.58 39 99 0.51 d - clusternode8
192.168.0.59 192.168.0.59 15 96 0.00 - m clusternode9
192.168.0.56 192.168.0.56 38 93 0.52 d - clusternode6
192.168.0.54 192.168.0.54 42 96 0.10 d - clusternode4
192.168.0.52 192.168.0.52 27 84 0.26 d m clusternode2
192.168.0.82 192.168.0.82 16 92 0.12 - - balancernode
192.168.0.57 192.168.0.57 29 96 0.04 - * clusternode7
192.168.0.53 192.168.0.53 64 100 0.03 d - clusternode3
192.168.0.55 192.168.0.55 35 30 0.27 d - clusternode5

Clusternode1 is overloaded and the others seem don't work.


(piter) #2

Any help?


(Christian Dahlqvist) #3

You have not provided a lot of information to go on, so I am not surprised that no-one have been able to help. I would recommend answering the following questions:

  • Which version of Elasticsearch are you using?
  • What is the specification of your nodes/cluster?
  • What is your use case?
  • Is there anything in the Elasticsearch logs on the node that is highly loaded?
  • is there anything, e.g. with respect to shard distribution, data volume or configuration, which sets this node apart from the other?

(piter) #4

Thanks Christian,

  • Which version of Elasticsearch are you using? 2.4.2
  • What is the specification of your nodes/cluster? All nodes are VM with intel core Xeon and 8GB Ram and 1TB hd each one.
  • What is your use case? I use elastic to analyze syslog and search alert
  • Is there anything in the Elasticsearch logs on the node that is highly loaded? I don't know. Can you explain me how find it?
  • is there anything, e.g. with respect to shard distribution, data volume or configuration, which sets this node apart from the other? No. All cluster nodes have identical settings.

My cluster is active from many years. It is the first time that a single node is overloaded.


(Christian Dahlqvist) #5

Look at the Elasticsearch logs for anything unusual, e.g. errors, warning or long/frequent GC. The location depends on how you installed it.

Also look at the hot threads API to see what the node is doing.


(piter) #6

With command

curl 'localhost:9200/_nodes/hot_threads'

This is the result:

::: {clusternode1}{Q0cNwhy0Q0yy8Ba6yZoVHA}{192.168.0.11}{192.168.0.11:9500}{zone=A, master=false}
Hot threads at 2018-10-19T11:44:47.084Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

95.1% (475.6ms out of 500ms) cpu usage by thread 'elasticsearch[clusternode1][search][T#25]'
10/10 snapshots sharing following 23 elements
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:162)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:201)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:159)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:159)
org.elasticsearch.search.aggregations.AggregationPhase.execute(AggregationPhase.java:167)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:119)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:372)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:385)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:293)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

94.3% (471.4ms out of 500ms) cpu usage by thread 'elasticsearch[clusternode1][search][T#1]'
10/10 snapshots sharing following 23 elements
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:162)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:201)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:159)
org.elasticsearch.search.aggregations.AggregatorFactory$1.buildAggregation(AggregatorFactory.java:219)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116)
org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:159)
org.elasticsearch.search.aggregations.AggregationPhase.execute(AggregationPhase.java:167)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:119)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:372)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:385)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:293)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

How can I have to read this results?


(piter) #7

I see some shards relocation from node overloaded but it takes long time to do.
Hot thread are referred also to relocation shards?


(Christian Dahlqvist) #8

I am not an expert in reading the finer details of the hot threads output, but it looks like the node is busy building global ordinals for an aggregation query. Are you using custom routing or parent-child relationships that could cause such uneven load?


(piter) #9

No , any custom routing or parent-child relationships.


(piter) #10

I think the problem is that all shards of an index are in the same node. So when I try a query on that index the node becomes overloaded.
There is a mean to divide on more nodes the shards?


(Christian Dahlqvist) #11

You could use the cluster reroute API to move shards around. Another option might be to use the total shards per node settings to force a spread across the cluster.


(piter) #12

Thanks ! I solved. I modified template index with "total_shards_per_node":"2".
Now the node it seems balanced with others.


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.