One shard getting more load

ruben · December 11, 2015, 8:54am

Hi

We are seeing a strange behaviour were all servers with shard 1 is getting more load than the others.

We are running a cluster of 18 machines (on AWS) it only one index with 5 shards in the cluster. There is no special routing of documents used.

Any pointers on how to debug this would be greatly appreciated !

anishek · December 11, 2015, 11:10am

We are facing similar issue where some nodes are more heavily loaded than others given that all the nodes have all shards required to serve the query, We are using ES mostly for percolation and have 3 primaries and 8 replicas for 9 node machines.

Still we see certain nodes at 80% usage and others at 5-10%. I am planning to use _preference=_local to see if that will help. Will let you know if it helps.

ruben · December 16, 2015, 3:20pm

Thanks for your answer, i don't think "preference=local" will help us as each node only has one shard.

Does anyone know how to check which node acted as the "coordinating node" for a specific search ? We are using Transport client connection to an ELB with all nodes in the ELB, the only thing I can think of is if some nodes are "coordinating node" more than others and that would generate more load.

anishek · December 17, 2015, 11:50am

unless the elb is sending requests to the overloaded machines coordination should not increase the load too much on specific machines. is there a way ELB logs provide logs for redistribution of requests across nodes.

I tried the preference=local option that did not help us. By default the routing for us is round robin on the transport client, setting the "client.transport.sniff" property to true. we also enabled trace logs on the transport client for sometime to see the distribution and it goes pretty well in a round robin fashion across nodes we have. Still certain nodes are always heavily loaded.

ruben · December 21, 2015, 9:11am

Thanks for the help, luckily it solved itself...
Look like elastic did a "big" merge of segments (about 20 % less storage used after). After that we see even load on all nodes. Strange that it waited so long, this has been a problem for > 2 weeks.

Topic		Replies	Views
All load is being concentrated on one node? Elasticsearch	17	3856	November 2, 2018
Query Routing Issue Elasticsearch	4	641	July 21, 2018
Does es node prioritize local shard for searching? Elasticsearch	3	258	April 26, 2023
One node in cluster is using (a lot) more heap space and cpu Elasticsearch	4	2433	July 5, 2017
Is es really load balance? Elasticsearch	3	356	July 6, 2017

One shard getting more load

Related topics