One Master Node removes and adds cluster nodes on regular base


(Thomas Tomski) #1

Hello,

I have an ELS Cluster running with 8 Data nodes and 3 Master only Nodes.
One of these Master nodes remove and adds cluster members on regular base.
Here is an extract from the log file:

Here, the Node gets removed:
[2018-11-14T15:34:07,372][INFO ][o.e.c.s.ClusterApplierService] [logstash01.mgm.xxx] removed {{elasticsearch03.mgm.xxx}{nldBGzbgQ62AV6k29wG2dQ}{zc8HBwN6SDqksV0A1Xx8qA}{10.201.x.xx}{10.201.x.xx:9300}{ml.machine_memory=8352514048, rack=live, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {logstash01.mgm.xxx}{90Vy_3eISGONm0lkcmdmig}{rI7B6L0uQNyPNTbvzMIWew}{10.83.x.xx}{10.83.x.xx:9300}{ml.machine_memory=4124659712, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [123681]])

Here, the previously removed Node gets added to the cluster again:
[2018-11-14T15:34:11,191][INFO ][o.e.c.s.ClusterApplierService] [logstash01.mgm.xxx] added {{elasticsearch03.mgm.xxx}{nldBGzbgQ62AV6k29wG2dQ}{zc8HBwN6SDqksV0A1Xx8qA}{10.201.x.xx}{10.201.x.xx:9300}{ml.machine_memory=8352514048, rack=live, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {logstash01.mgm.xxx}{90Vy_3eISGONm0lkcmdmig}{rI7B6L0uQNyPNTbvzMIWew}{10.83.xx.xx}{10.83.xx.xx:9300}{ml.machine_memory=4124659712, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [123683]])

What can cause this behavior ?

Best regards, Thomas


(Christian Dahlqvist) #2

Do you have a very large cluster state, e.g. lots of indices and shards? Are you nodes under heavy load and/or suffering from long or frequent GC?


(Thomas Tomski) #3

Hello,

The problem was the same as here:

It was solved by modifying the sysctl parameters:

 # Reduce keep alive values to increase frequency of keep alive packets sent across network in order to prevent ElasticSearch nodes losing connection between each other
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 20

Cheers!