Load not evenly distributed

(Jannes Van Helsdingen) #1

Hi, I have a 3+ node setup, with all nodes having all roles.
1 node gets up to 90% cpu and frequent garbage collection 2nd node is a bit less but reasonable and then nodes 3+ are doing nearly nothing. If I stop and start nodes a different node will get the high load.

Are there any suggestions on how I can even the load over 3+ nodes?

Its running on linux and on their own machines v5.5.2.

Balancing searches on a cluster
Discover tab timing out with single _msearch request
(Mark Walkom) #2

How are you interacting with the cluster, Kibana, something else?

(Jannes Van Helsdingen) #3

With x-pack and kibana yes

(Mark Walkom) #4

Right, but what about writing to the cluster?

(Jannes Van Helsdingen) #5

Its used for exceptionless.

(Mark Walkom) #6

Does it do load balancing or does it just talk to a single node?

(Jannes Van Helsdingen) #7

It does loadbalancing (I tried with the ip of all nodes in config as well as a round robin ip).

(Mark Walkom) #8

What does Monitoring tell you about differences in load/indexing/query loads?

(Jannes Van Helsdingen) #9

Atm its on 2 nodes because the 3rd node as just about no load and what I can see is:
node 1 cpu: 26% node 2 cpu 80% mem for node 1 gets garbage collection to almost 0 where node 2 is 70%+ after garbage collection.
Request rate (indexing just over 2k for both and search rate just over 1k for both)
not sure how to check query load?

Performance of search requests in cluster from Kibana
(Christian Dahlqvist) #10

What is the output of the cat nodes API?

(Jannes Van Helsdingen) #11

heap.percent ram.percent cpu load_1m load_5m load_15m node.role master
43-----------------97---------------43--1.27-------1.35---------1.28 mdi -
45-----------------98---------------81--3.65-------3.64--------- 3.80 mdi *

(Mark Harwood) #12

This may be related: https://github.com/elastic/elasticsearch/issues/24642
The fix is in the impending 6.0 release.

(Jannes Van Helsdingen) #14

Thanks, is there anything I can set to fix it on 5.5.2?

Will removing kibana (x-pack on the nodes) solve the problem (if thats even possible to do without breaking elasticsearch) or using new nodes without x-pack/kibana?

(Mark Harwood) #15

Two ugly choices:

  • Set replicas to zero to rebalance with all primaries (and no redundancy!)
  • Install a proxy between Kibana and elasticsearch to strip out preference=sessionId parameters

Kibana uses sessionId based routing to ensure each user revisits the same nodes and has warm caches for their queries so is desirable but the 5.x primary vs replica selection routing logic is not ideal when using this feature.

Note however, that with many users their collective loads should be spread evenly across data nodes but a single user will load the cluster unevenly.

(Jannes Van Helsdingen) #16

Thank you!

(Jannes Van Helsdingen) #17

I dont see how it can be kibana because exceptionless connects directly to the nodes so the traffic should'nt be affected by kibana.

(Christian Dahlqvist) #18

You said you had 3 nodes in the cluster, but I only see two here? Have you set minimum_master_nodes correctly to avoid split brain scenarios?

(Mark Harwood) #19

SessionID based query routing is a feature supported by elasticsearch and used by Kibana. It may also be used by exceptionless.

(Jannes Van Helsdingen) #20

Yes the minimum_master_nodes is setup correctly. It used to be 3 nodes but I stoped the 1 because it was literately doing nothing (5% cpu and 4 garbage collection a day) and I'm aware its not an ideal setup atm, minimum_master_nodes is set to 2 so split brain should not be a problem.

The number of shards is set to the number of nodes so as I understand it the primary index should be split between all nodes.

(Mark Walkom) #21

No, but if you lose one master you lose the cluster.