For some reason the heap fills very fast on two of our data nodes, while the other two behaves normal.
I have a cluster with 9 nodes:
ES Version: 6.4.0
Running on ubuntu 16.04.4.
openjdk version "1.8.0_171" OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11) OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
Our four data nodes have the same configuration, and the hardware is identical.
The hardware looks like this:
28 gb RAM,
ES is configured to use 14gb heap:
And this is the GC configuration:
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
This graph shows heap usage:
So node 01 and 02 are slowly using up the heap, and then gc kicks in at 75%.
Node 03 and 04 on the other hand use up the heap fast, and then gc kicks in at 75%.
03 and 04 also log a ton of these:
[2018-11-28T10:43:10,540][INFO ][o.e.m.j.JvmGcMonitorService] [escl01data04] [gc] overhead, spent [292ms] collecting in the last [1s]
All of our indices have 2 primary shards and 2 replica, and this is evenly distributed over the four nodes.
Anyone have any idea what might cause this?
I have compared jvm.options, elasticsearch.yml, java version, elastic version and service configuration. All are identical.
I've also checked amount of connections to the nodes, and they seem to look the same for each machine. Same with thread count, hoovering around 500ish threads.
And I've already tried restarting all nodes (not just the data nodes).