All load is being concentrated on one node?

I have a problem with elastic search, all the load is concentrated on just one node, if I add new nodes they just sit there doing nothing.
How can I make elastic search distribute the load around all the nodes on the cluster?

I'm running a 3 node cluster

The cluster is just serving search requests its not injesting or indexing data.
the biggest index is 100 million records
I'm searching product titles for keywords. nothing complex.

here's the output from:

curl localhost:9200/_cat/nodes?v

ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
xxx.xxx.xxx.48            9          99  21   39.54   37.08    31.97 mdi       -      ES2
xxx.xxx.xxx.77           11          99  40    9.82   11.21    11.24 mdi       *      ES1
xxx.xxx.xxx.223          11          99  16    2.75    4.05     4.04 di        -      ES3

as you can see most of the load is on ES2 while ES3 is doing nothing.
I'm sending all the search requests to ES3 in the hope that it would take over some of the work.

All these servers have: 24 cpus, 65gb ram

running: ES 6.4.1

I basically installed Elastic Search from scratch,
change the /etc/elasticsearch/jvm.options to -Xms24g -Xmx24g
set discovery.zen.minimum_master_nodes: 2
all nodes are data nodes, with two masters
I hooked up the nodes to the cluster as normal.

I've tried changing the number of primary shards from 5 to 10 with 1 replica but it doesn't make any difference.
I've tried restarting the servers / Elastic Search multiple times.
The cluster status is green.

i'm lost as what to try next!

Are the shards evenly distributed across the nodes? Are all of the nodes sharing the same specification and configuration?

Yes all shards are evenly distributed

all nodes are exactly the same with the same config and Os / memory / cpu etc...

Are you hitting all indices evenly? Are you using some feature that could cause an imbalanced load, e.g. custom routing or parent-child?

I expect some indices are reciving a lot more search requests than others.

I haven't set up any custom routing or parent child features

Are you querying using preference, which would cause the same shards to be queried? Are the shards for most frequently queried indices evenly distributed?

this is the search query i'm using..

_search?q=title:$keyword&from=$start&size=50

this is the shard distribution on the busy index..

What is the output of the hot threads API on the busy node?

Although it is not related, I would also recommend making the third node master-eligible as well. You always want a minimum of 3 master eligible nodes in a cluster.

Ok thanks will do.. btw thank you for your help with this!

I ran this command...

_nodes/ES2/hot_threads

it came back with...

https://pastebin.com/8yCfeNsY

(too much to paste here)

here's an example of the load..

blue = es2
green = es1
purple = es3

load

I can not see any reason for this unless one of the nodes is misconfigured, there is some hardware issue or you are using one of the features I mentioned. What does disk I/O look like on the different nodes? Is there anything in the Elasticsearch logs?

here's the disk stats...

black = es2
purple = es1
blue = es3

I'll check the logs..

Also check if you are having any problems with the disks.

I can't see anything in the log file

I found a similar issue here, could it be the version of Java? i'm using the Open version

openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

I switched to the Oracle Java SDK on all 3 servers and restarted Elastic Search but it made no difference...

ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
xxx.xxx.xxx.223          12          99  10    2.91    3.28     2.19 mdi       -      ES3
xxx.xxx.xxx.48           11          99  16   39.08   35.10    20.84 mdi       -      ES2
xxx.xxx.xxx.77           13          99  15    9.01    9.31     5.75 mdi       *      ES1

So I shut down ES2 to see what would happen...

ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
xxx.xxx.xxx.223            12          99  28    9.80    8.25     5.25 mdi       -      ES3
xxx.xxx.xxx.77           13          99  62   39.81   31.41    17.69 mdi       *      ES1

All the load jumped to ES1!

so I shut down ES1....

ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
xxx.xxx.xxx.223           11          99  28   10.14   11.29     9.00 mdi       *      ES3

Now everything is running fine and stable just on a one node cluster (ES3)!

why does adding another node increase the load on that extra node?

any ideas?

1 Like

Can you check in which node the shards of the index to which you are indexing the data are allocated?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.