Load not evenly distributed

Hi, I have a 3+ node setup, with all nodes having all roles.
1 node gets up to 90% cpu and frequent garbage collection 2nd node is a bit less but reasonable and then nodes 3+ are doing nearly nothing. If I stop and start nodes a different node will get the high load.

Are there any suggestions on how I can even the load over 3+ nodes?

Its running on linux and on their own machines v5.5.2.

How are you interacting with the cluster, Kibana, something else?

With x-pack and kibana yes

Right, but what about writing to the cluster?

Its used for exceptionless.

Does it do load balancing or does it just talk to a single node?

It does loadbalancing (I tried with the ip of all nodes in config as well as a round robin ip).

What does Monitoring tell you about differences in load/indexing/query loads?

Atm its on 2 nodes because the 3rd node as just about no load and what I can see is:
node 1 cpu: 26% node 2 cpu 80% mem for node 1 gets garbage collection to almost 0 where node 2 is 70%+ after garbage collection.
Request rate (indexing just over 2k for both and search rate just over 1k for both)
not sure how to check query load?

What is the output of the cat nodes API?

heap.percent ram.percent cpu load_1m load_5m load_15m node.role master
43-----------------97---------------43--1.27-------1.35---------1.28 mdi -
45-----------------98---------------81--3.65-------3.64--------- 3.80 mdi *

This may be related: https://github.com/elastic/elasticsearch/issues/24642
The fix is in the impending 6.0 release.

Thanks, is there anything I can set to fix it on 5.5.2?

Will removing kibana (x-pack on the nodes) solve the problem (if thats even possible to do without breaking elasticsearch) or using new nodes without x-pack/kibana?

Two ugly choices:

  • Set replicas to zero to rebalance with all primaries (and no redundancy!)
  • Install a proxy between Kibana and elasticsearch to strip out preference=sessionId parameters

Kibana uses sessionId based routing to ensure each user revisits the same nodes and has warm caches for their queries so is desirable but the 5.x primary vs replica selection routing logic is not ideal when using this feature.

Note however, that with many users their collective loads should be spread evenly across data nodes but a single user will load the cluster unevenly.


Thank you!

I dont see how it can be kibana because exceptionless connects directly to the nodes so the traffic should'nt be affected by kibana.

You said you had 3 nodes in the cluster, but I only see two here? Have you set minimum_master_nodes correctly to avoid split brain scenarios?

SessionID based query routing is a feature supported by elasticsearch and used by Kibana. It may also be used by exceptionless.

Yes the minimum_master_nodes is setup correctly. It used to be 3 nodes but I stoped the 1 because it was literately doing nothing (5% cpu and 4 garbage collection a day) and I'm aware its not an ideal setup atm, minimum_master_nodes is set to 2 so split brain should not be a problem.

The number of shards is set to the number of nodes so as I understand it the primary index should be split between all nodes.

No, but if you lose one master you lose the cluster.

