We recently had a similar issue with a very unstable ES node
3 data nodes (master set to false )
1 master node
The 4 of them are running on a RHEL virtual machine 8Gb RAM.
2.6.32-504.el6.x86_64 GNU/Linux
Running on ESX vCenter version 5.5.0
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
ES : 5.3.0
from time to time, one of the data node just freeze without any error messages in the log
Only logs on master node :
Java process is still running but ES master has detected one node disconneted
[2017-05-13T13:24:32,031][INFO ][o.e.c.r.a.AllocationService] [hotstmaster] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{xxxxxxxxxx}{RIiOYjqjSSKsixk9j7NMrg}{Xiup0jG0Q5-hESB4HupRkQ}{xxx.xxx.xx.xxx}{xxx.xxx.xx.xxx:9300} failed to ping, tried [3] times, each with maximum [30s] timeout]).
We had to kill -9 Pid process and restart it to reconnect lost node.
So ES relocate shards on it
What can we do to stabilize our node cluster ?
Thanks a lot and BR