I have two different 5 node ES clusters. Both are running on the same VMware cluster. Every now and again the health of the cluster will go to red or even yellow. When I look at marvel i'll see that one of the nodes is no longer part of the cluster.
I'll log into the node and I can move around various directories just fine until I attempt to go into the directory we have ES installed in. For example if I attempt to just do an ls within /opt (which is where ES is installed) the system will lock up and I cannot do anything within /opt.
I am able to go into say /var/log and look at various log files. I don't see anything related to why that part of the filesystem is not accessible.
Once I reboot the system it comes back up just fine, ES, kibana and Marvel all come up and it becomes part of the cluster again. It will work for various amounts of time. Might be a week or day, It's even gone a couple of weeks without a problem.
OEL 7 3.10.0-123.el7
OpenJDK 25.101-b13
JRE 1.8.0_101-b13
ES 2.3.4
kibana 4.5.4
Each VM has 32gb of memory and 4 cpu's