Elasticsearch 5.3.0 unstable nodes on VMWare

We recently had a similar issue with a very unstable ES node
3 data nodes (master set to false )
1 master node

The 4 of them are running on a RHEL virtual machine 8Gb RAM.
2.6.32-504.el6.x86_64 GNU/Linux
Running on ESX vCenter version 5.5.0
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
ES : 5.3.0

from time to time, one of the data node just freeze without any error messages in the log
Only logs on master node :
Java process is still running but ES master has detected one node disconneted
[2017-05-13T13:24:32,031][INFO ][o.e.c.r.a.AllocationService] [hotstmaster] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{xxxxxxxxxx}{RIiOYjqjSSKsixk9j7NMrg}{Xiup0jG0Q5-hESB4HupRkQ}{xxx.xxx.xx.xxx}{xxx.xxx.xx.xxx:9300} failed to ping, tried [3] times, each with maximum [30s] timeout]).

We had to kill -9 Pid process and restart it to reconnect lost node.
So ES relocate shards on it

What can we do to stabilize our node cluster ?

Thanks a lot and BR

Why run 3 nodes on a single host that is only that big? Why not just run a single one?

Oh Sorry , all Nodes are running on 4 different VM , and different ESX

Try a different type of installation.
For Eg: I installed ES 5.3 on Ubuntu 16.04 using dpkg and I experienced the same thing. Nodes used to leave the cluster quite often.
Then I tried apt-get install after updating my sources, everything works well now.
Also check the JVM Heap, it should be set to half your RAM size.

Hope this helps.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.