Elasticsearch 5.3.0 unstable nodes on VMWare

(LaurentL) #1

We recently had a similar issue with a very unstable ES node
3 data nodes (master set to false )
1 master node

The 4 of them are running on a RHEL virtual machine 8Gb RAM.
2.6.32-504.el6.x86_64 GNU/Linux
Running on ESX vCenter version 5.5.0
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
ES : 5.3.0

from time to time, one of the data node just freeze without any error messages in the log
Only logs on master node :
Java process is still running but ES master has detected one node disconneted
[2017-05-13T13:24:32,031][INFO ][o.e.c.r.a.AllocationService] [hotstmaster] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{xxxxxxxxxx}{RIiOYjqjSSKsixk9j7NMrg}{Xiup0jG0Q5-hESB4HupRkQ}{xxx.xxx.xx.xxx}{xxx.xxx.xx.xxx:9300} failed to ping, tried [3] times, each with maximum [30s] timeout]).

We had to kill -9 Pid process and restart it to reconnect lost node.
So ES relocate shards on it

What can we do to stabilize our node cluster ?

Thanks a lot and BR

(Mark Walkom) #2

Why run 3 nodes on a single host that is only that big? Why not just run a single one?

(LaurentL) #3

Oh Sorry , all Nodes are running on 4 different VM , and different ESX


Try a different type of installation.
For Eg: I installed ES 5.3 on Ubuntu 16.04 using dpkg and I experienced the same thing. Nodes used to leave the cluster quite often.
Then I tried apt-get install after updating my sources, everything works well now.
Also check the JVM Heap, it should be set to half your RAM size.

Hope this helps.

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.