Help with Heap at 100% and random node disconnect


since few weeks we're experiencing a weird ElasticSearch behaviour, the heap usage is randomly increasing up to 100% and the data node disconnecsts from the cluster, then after a minute the node join the cluster again.

We are running ES 1.7.5 on JVM 1.7.0_95, Debian 8.4 jessie.


8-core CPU, 120GB RAM, heap size 31GB with UseCompressedOops := True

java -Xmx31G -XX:+PrintFlagsFinal 2> /dev/null | grep UseCompressedOops bool UseCompressedOops := true {lp64_product}

We have 181 shards, 15 shards per 12 indices + 1 shard per 1 index. Our current database is 2.67TB.

I know there are plenty of things I can tell you yet, but I'm currently lost between all the debug data. Is there anybody who can guide me debugging this issue? Please ask me whatever you need to figure out what could be wrong.

Many thanks to everybody in advance.

Is there much happening with GC?