Nodes Constantly Disconencted from Cluster

Arief · April 1, 2019, 4:15am

I have 3 Master as Hot Nodes and 3 Data Warm Nodes all with 16GB of RAM and 73 Index (3 Shards, 1 replica). However all 3 of the Warm nodes always and randomly disconnected and I can't figure out why. Ping to all those 3 nodes is normal and no timeout when the disconnection happens and all the server are in the same network.

This behavior can be observed when I'm moving and reindex indices to the Warm Nodes causing issues to the cluster.

ES version : 6.7
Logs from one of master node :

[2019-04-01T12:13:07,732][INFO ][o.e.c.s.MasterService    ] [FI-ELK6-NODE-1] zen-disco-node-failed({FI-ELK6-WarmNode1}{NBB7-Oj_ThS_jFS2vGJmaA}{KIMipXv9SbSyxZccmxifQA}{192.168.28.90}{192.168.28.90:9300}{xpack.installed=true, box_type=warm}), reason(transport disconnected)[{FI-ELK6-WarmNode1}{NBB7-Oj_ThS_jFS2vGJmaA}{KIMipXv9SbSyxZccmxifQA}{192.168.28.90}{192.168.28.90:9300}{xpack.installed=true, box_type=warm} transport disconnected], reason: removed {{FI-ELK6-WarmNode1}{NBB7-Oj_ThS_jFS2vGJmaA}{KIMipXv9SbSyxZccmxifQA}{192.168.28.90}{192.168.28.90:9300}{xpack.installed=true, box_type=warm},}
[2019-04-01T12:13:08,040][INFO ][o.e.c.s.ClusterApplierService] [FI-ELK6-NODE-1] removed {{FI-ELK6-WarmNode1}{NBB7-Oj_ThS_jFS2vGJmaA}{KIMipXv9SbSyxZccmxifQA}{192.168.28.90}{192.168.28.90:9300}{xpack.installed=true, box_type=warm},}, reason: apply cluster state (from master [master {FI-ELK6-NODE-1}{0ho7GGABRHKH7Dvi73k8BA}{tHDN2KbQRT-wFqCTM3bGdw}{192.168.28.74}{192.168.28.74:9300}{xpack.installed=true, box_type=hot} committed version [1956] source [zen-disco-node-failed({FI-ELK6-WarmNode1}{NBB7-Oj_ThS_jFS2vGJmaA}{KIMipXv9SbSyxZccmxifQA}{192.168.28.90}{192.168.28.90:9300}{xpack.installed=true, box_type=warm}), reason(transport disconnected)[{FI-ELK6-WarmNode1}{NBB7-Oj_ThS_jFS2vGJmaA}{KIMipXv9SbSyxZccmxifQA}{192.168.28.90}{192.168.28.90:9300}{xpack.installed=true, box_type=warm} transport disconnected]]])

Any help is appreciated.

BR.

adelmo · April 3, 2019, 11:47am

I met the same problem today. Have you solved it ?

DavidTurner · April 3, 2019, 12:00pm

It very much looks like a connectivity issue, but it's hard to tell any more from the two short log lines that you've shared. There will be more messages, including stack traces, that will contain more information.

Arief · April 3, 2019, 12:32pm

I'll get the logs and do some testing once I'm back to the workplace, but based from my previous observation there was no timeout when doing ping test to the master however at the same time elasticsearch logged a "[WARN ][o.e.d.z.ZenDiscovery ] [FI-ELK6-WarmNode3] master left (reason = failed to ping, tried [3] times, each with maximum [1m] timeout), current nodes: nodes:" which baffles me.

DavidTurner · April 3, 2019, 12:39pm

Ok, the "pings" that Elasticsearch mentions in that message are very different from the pings that the ping command sends, so a simple "ping test" is not a reliable indicator. Mostly when I've seen this sort of thing in the past it's normally been due to a misconfigured firewall. Note the following entry in the reference manual:

Elasticsearch opens a number of long-lived TCP connections between each pair of nodes in the cluster, and some of these connections may be idle for an extended period of time. Nonetheless, Elasticsearch requires these connections to remain open, and it can disrupt the operation of the cluster if any inter-node connections are closed by an external influence such as a firewall. It is important to configure your network to preserve long-lived idle connections between Elasticsearch nodes, for instance by leaving tcp.keep_alive enabled and ensuring that the keepalive interval is shorter than any timeout that might cause idle connections to be closed, or by setting transport.ping_schedule if keepalives cannot be configured.

Christian_Dahlqvist · April 3, 2019, 12:39pm

An Elasticsearch ping is at the application level over TCP and therefore very different to an OS ping.

system · May 1, 2019, 12:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Node disconnecting randomly Elasticsearch	28	2326	April 29, 2021
Nodes disconnected randomly Elasticsearch painless	1	311	September 19, 2022
Nodes randomly disconnected from the ES cluster Elasticsearch	10	7267	November 4, 2022
Elasticsearch nodes continually disconneting/reconnecting. Resulting in very high number of unassigned shards Elasticsearch	18	2657	September 3, 2020
ES nodes disconnects intermittently from the cluster Elasticsearch	1	630	February 8, 2018

Nodes Constantly Disconencted from Cluster

Related topics