Hello,
For some reason the heap fills very fast on two of our data nodes, while the other two behaves normal.
I have a cluster with 9 nodes:
2 coordinator,
3 master,
4 data
ES Version: 6.4.0
Running on ubuntu 16.04.4.
Java version:
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
Our four data nodes have the same configuration, and the hardware is identical.
The hardware looks like this:
28 gb RAM,
6 CPUs
SSD
ES is configured to use 14gb heap:
-Xms14g
-Xmx14g
And this is the GC configuration:
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
This graph shows heap usage:
So node 01 and 02 are slowly using up the heap, and then gc kicks in at 75%.
Node 03 and 04 on the other hand use up the heap fast, and then gc kicks in at 75%.
03 and 04 also log a ton of these:
[2018-11-28T10:43:10,540][INFO ][o.e.m.j.JvmGcMonitorService] [escl01data04] [gc][141] overhead, spent [292ms] collecting in the last [1s]
All of our indices have 2 primary shards and 2 replica, and this is evenly distributed over the four nodes.
Anyone have any idea what might cause this?
I have compared jvm.options, elasticsearch.yml, java version, elastic version and service configuration. All are identical.
I've also checked amount of connections to the nodes, and they seem to look the same for each machine. Same with thread count, hoovering around 500ish threads.
And I've already tried restarting all nodes (not just the data nodes).