Data nodes left cluster

Today i noticed that two data nodes left the cluster although disk ,cpu,ram was fine but i can't find the reason in log
Currently i am using latest version ELK 7.13.2.I am running 3 master nodes plus 6 data nodes and 2 coordinate nodes

[2021-08-07T20:23:26,020][INFO ][o.e.c.s.ClusterApplierService] [ed5] removed {{ed3}{Gii_DpsTTc6uIfOZX92Ntg}{HLkfUyZPRmiMGUWiJEqyPg}{ed3}{xx.xx.xx.xx:9300}{hs}}, term: 3, version: 11791, reason: ApplyCommitRequest{term=3, version=11791, sourceNode={em1}{lHTyGlo5RYutmGDIY0tbLw}{XNPidE0MSQi0c5HY1u96qg}{em1}{xx.xx.xx.xx:9300}{m}{xpack.installed=true, transform.node=false}}
[2021-08-07T20:25:08,563][INFO ][o.e.c.s.ClusterApplierService] [ed5] added {{ed3}{Gii_DpsTTc6uIfOZX92Ntg}{MoUn8QH1SnSfh2A6AZ_ZZg}{ed3}{xx.xx.xx.xx:9300}{hs}}, term: 3, version: 11812, reason: ApplyCommitRequest{term=3, version=11812, sourceNode={em1}{lHTyGlo5RYutmGDIY0tbLw}{XNPidE0MSQi0c5HY1u96qg}{em1}{xx.xx.xx.xx:9300}{m}{xpack.installed=true, transform.node=false}}
[2021-08-07T20:33:39,674][INFO ][o.e.c.s.ClusterApplierService] [ed5] removed {{em3}{7KlJJ-vvRoGgxS5HZPARQw}{1KbUL50cRSyuBIYiQoVTNw}{em3}{xx.xx.xx.xx:9300}{m}}, term: 3, version: 11867, reason: ApplyCommitRequest{term=3, version=11867, sourceNode={em1}{lHTyGlo5RYutmGDIY0tbLw}{XNPidE0MSQi0c5HY1u96qg}{em1}{xx.xx.xx.xx:9300}{m}{xpack.installed=true, transform.node=false}}
[2021-08-07T20:34:09,792][INFO ][o.e.c.s.ClusterApplierService] [ed5] removed {{ec1}{x8nHc1dDSgiTh3G9rD8raQ}{AkNhb4HGT_i9gQAchqnX9w}{ec1}{xx.xx.xx.xx:9300}, {ed6}{JKN7eWd3ToyA2b_WbnPLUg}{NN0HXQX8S0WS98j3OPIPjA}{ed6}{xx.xx.xx.xx:9300}{hs}}, term: 3, version: 11868, reason: ApplyCommitRequest{term=3, version=11868, sourceNode={em1}{lHTyGlo5RYutmGDIY0tbLw}{XNPidE0MSQi0c5HY1u96qg}{em1}{xx.xx.xx.xx:9300}{m}{xpack.installed=true, transform.node=false}}
[2021-08-07T20:34:09,811][INFO ][o.e.i.s.IndexShard       ] [ed5] [log-wlb-powershell-2021.08.04-000005][0] primary-replica resync completed with 0 operations
[2021-08-07T20:34:09,817][INFO ][o.e.i.s.IndexShard       ] [ed5] [log-pb-tls-2021.08.03-000005][0] primary-replica resync completed with 0 operations
[2021-08-07T20:34:09,821][INFO ][o.e.i.s.IndexShard       ] [ed5] [log-wlb-application-2021.06.05-000003][0] primary-replica resync completed with 0 operations
[2021-08-07T20:34:09,832][INFO ][o.e.i.s.IndexShard       ] [ed5] [log-wlb-system-2021.06.05-000003][0] primary-replica resync completed with 0 operations
[2021-08-07T20:34:10,238][INFO ][o.e.t.TaskCancellationService] [ed5] failed to remove the parent ban for task h8mbKJT9TnqsvZ-XiBiDlg:23079407 for connection org.elasticsearch.transport.TcpTransport$NodeChannels@452ed3fb
[2021-08-07T20:34:10,563][WARN ][r.suppressed             ] [ed5] path: /_cluster/stats, params: {}
org.elasticsearch.tasks.TaskCancelledException: task cancelled
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.finishHim(TransportNodesAction.java:262) [elasticsearch-7.13.2.jar:7.13.2]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:256) [elasticsearch-7.13.2.jar:7.13.2]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$100(TransportNodesAction.java:186) [elasticsearch-7.13.2.jar:7.13.2]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:236) [elasticsearch-7.13.2.jar:7.13.2]
	at org.elasticsearch.transport.TransportService$5.handleException(TransportService.java:738) [elasticsearch-7.13.2.jar:7.13.2]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1283) [elasticsearch-7.13.2.jar:7.13.2]
	at org.elasticsearch.transport.TransportService$8.run(TransportService.java:1145) [elasticsearch-7.13.2.jar:7.13.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) [elasticsearch-7.13.2.jar:7.13.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:831) [?:?]
[2021-08-07T20:34:11,212][INFO ][o.e.t.TaskCancellationService] [ed5] failed to remove the parent ban for task h8mbKJT9TnqsvZ-XiBiDlg:23079566 for connection org.elasticsearch.transport.TcpTransport$NodeChannels@452ed3fb
[2021-08-07T20:34:11,212][WARN ][r.suppressed             ] [ed5] path: /_cluster/stats, params: {}
org.elasticsearch.tasks.TaskCancelledException: task cancelled
2021-08-07T20:34:11,216][INFO ][o.e.t.TaskCancellationService] [ed5] failed to remove the parent ban for task h8mbKJT9TnqsvZ-XiBiDlg:23079540 for connection org.elasticsearch.transport.TcpTransport$NodeChannels@452ed3fb
[2021-08-07T20:34:11,217][INFO ][o.e.t.TaskCancellationService] [ed5] failed to remove the parent ban for task h8mbKJT9TnqsvZ-XiBiDlg:23079541 for connection org.elasticsearch.transport.TcpTransport$NodeChannels@452ed3fb
[2021-08-07T20:35:46,201][INFO ][o.e.c.s.ClusterApplierService] [ed5] added {{em3}{7KlJJ-vvRoGgxS5HZPARQw}{TSSQgSEGSIeWyT9QMAKGWQ}{em3}{xx.xx.xx.xx:9300}{m}}, term: 3, version: 11886, reason: ApplyCommitRequest{term=3, version=11886, sourceNode={em1}{lHTyGlo5RYutmGDIY0tbLw}{XNPidE0MSQi0c5HY1u96qg}{em1}{xx.xx.xx.xx:9300}{m}{xpack.installed=true, transform.node=false}}
[2021-08-07T20:35:47,191][INFO ][o.e.c.s.ClusterApplierService] [ed5] added {{ec1}{x8nHc1dDSgiTh3G9rD8raQ}{swvNM2HlS7G1V-ts85uvHA}{ec1}{xx.xx.xx.xx:9300}}, term: 3, version: 11887, reason: ApplyCommitRequest{term=3, version=11887, sourceNode={em1}{lHTyGlo5RYutmGDIY0tbLw}{XNPidE0MSQi0c5HY1u96qg}{em1}{xx.xx.xx.xx:9300}{m}{xpack.installed=true, transform.node=false}}

Is this log from the master? I do not see the node-left lines that are generated when a node lefts the cluster. Can you share the master logs at the moment the nodes left the cluster?

Also, how are you running your cluster? It is on-premises? Cloud provider? Docker? I have a node-left issue a couple of months ago that are related to underlying network issues, maybe your case is similar.

The nodes left the cluster and come back a couple of seconds later or they do not come back?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.