Leader_check - time out?

Hi,

We have a 3 node search cluster running on 3 vms.
We run version 7.4.2

We run all the 3 nodes with the same config:

discovery.seed_hosts: ["grtzpse1-7.bla", "grtzpse2-7.bla", "grtzpse3-7.bla"]
cluster.initial_master_nodes: ["grtzpse1-7", "grtzpse2-7", "grtzpse3-7"]
node.ingest: false
node.ml: false
bootstrap.memory_lock: true

It runs stable and responsive, but ones in the 2/3 days I am getting errors like this:

Node1:

[2020-06-23T09:03:34,148][INFO ][o.e.n.Node               ] [grtzpse1-7] started
[2020-06-23T21:25:48,730][WARN ][o.e.t.TransportService   ] [grtzpse1-7] Received response for a request that has timed out, sent [13809ms] ago, timed out [3
803ms] ago, action [internal:coordination/fault_detection/leader_check], node [{grtzpse3-7}{x73H-wgITUOSQW89fxcl_w}{aYpPkx3vRdusQDX4JM2B7A}{192.168.102.197}{
192.168.102.197:9300}{dm}{xpack.installed=true}], id [900581]
[2020-06-23T21:25:57,477][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [bCSt2BCHRB2uAscmCWeuyw:1969142] on the
 node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:03,623][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [x73H-wgITUOSQW89fxcl_w:1952985] on the
 node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:03,623][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [bCSt2BCHRB2uAscmCWeuyw:1969150] on the
 node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:03,631][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706181] on the
 node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:05,136][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [x73H-wgITUOSQW89fxcl_w:1952922] on the
 node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:05,509][WARN ][o.e.t.TransportService   ] [grtzpse1-7] Received response for a request that has timed out, sent [13206ms] ago, timed out [3
202ms] ago, action [internal:coordination/fault_detection/leader_check], node [{grtzpse3-7}{x73H-wgITUOSQW89fxcl_w}{aYpPkx3vRdusQDX4JM2B7A}{192.168.102.197}{
192.168.102.197:9300}{dm}{xpack.installed=true}], id [900679]
[2020-06-23T21:26:07,161][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706224] on the
 node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:16,942][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [x73H-wgITUOSQW89fxcl_w:1953192] on the
 node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:20,829][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706350] on the
 node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:21,999][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Removing ban for the parent [x73H-wgITUOSQW89fxcl_w:1953192] on the
 node [2zD8aPi3REeSYDCm--Nb8Q]
[2020-06-23T21:26:27,720][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706425] on the node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:32,167][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706447] on the node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:32,170][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706451] on the node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]
[2020-06-23T21:26:35,092][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse1-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706328] on the node [2zD8aPi3REeSYDCm--Nb8Q], reason: [by user request]

Node2:

[2020-06-23T21:25:56,661][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse2-7] Received ban for the parent [bCSt2BCHRB2uAscmCWeuyw:1969142] on the node [bCSt2BCHRB2uAscmCWeuyw], reason: [by user request]
[2020-06-23T21:25:57,547][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse2-7] Received ban for the parent [bCSt2BCHRB2uAscmCWeuyw:1969150] on the node [bCSt2BCHRB2uAscmCWeuyw], reason: [by user request]
[2020-06-23T21:25:59,363][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [grtzpse2-7] failed to execute on node [2zD8aPi3REeSYDCm--Nb8Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [grtzpse1-7][192.168.102.166:9300][cluster:monitor/nodes/info[n]] request_id [1076895] timed out after [10009ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1022) [elasticsearch-7.4.2.jar:7.4.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.4.2.jar:7.4.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2020-06-23T21:26:00,392][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse2-7] Received ban for the parent [x73H-wgITUOSQW89fxcl_w:1952922] on the node [bCSt2BCHRB2uAscmCWeuyw], reason: [by user request]
[2020-06-23T21:26:01,912][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse2-7] Received ban for the parent [x73H-wgITUOSQW89fxcl_w:1952985] on the node [bCSt2BCHRB2uAscmCWeuyw], reason: [by user request]
[2020-06-23T21:26:03,811][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse2-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706181] on the node [bCSt2BCHRB2uAscmCWeuyw], reason: [by user request]
[2020-06-23T21:26:07,336][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse2-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706224] on the node [bCSt2BCHRB2uAscmCWeuyw], reason: [by user request]
[2020-06-23T21:26:16,068][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [grtzpse2-7] failed to execute on node [2zD8aPi3REeSYDCm--Nb8Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [grtzpse1-7][192.168.102.166:9300][cluster:monitor/nodes/info[n]] request_id [1077121] timed out after [10008ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1022) [elasticsearch-7.4.2.jar:7.4.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.4.2.jar:7.4.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

Node3:

[2020-06-23T21:25:56,663][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse3-7] Received ban for the parent [bCSt2BCHRB2uAscmCWeuyw:1969142] on the
 node [x73H-wgITUOSQW89fxcl_w], reason: [by user request]
[2020-06-23T21:25:57,549][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse3-7] Received ban for the parent [bCSt2BCHRB2uAscmCWeuyw:1969150] on the
 node [x73H-wgITUOSQW89fxcl_w], reason: [by user request]
[2020-06-23T21:26:00,389][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse3-7] Received ban for the parent [x73H-wgITUOSQW89fxcl_w:1952922] on the
 node [x73H-wgITUOSQW89fxcl_w], reason: [by user request]
[2020-06-23T21:26:00,517][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [grtzpse3-7] failed to execute on node [2zD8aPi3REeSYDCm--Nb8Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [grtzpse1-7][192.168.102.166:9300][cluster:monitor/nodes/info[n]] request_id [984015] timed out
 after [10006ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1022) [elasticsearch-7.4.2.jar:7.4.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.4.2.jar:7.4.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2020-06-23T21:26:01,909][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse3-7] Received ban for the parent [x73H-wgITUOSQW89fxcl_w:1952985] on the
 node [x73H-wgITUOSQW89fxcl_w], reason: [by user request]
[2020-06-23T21:26:02,716][WARN ][o.e.c.InternalClusterInfoService] [grtzpse3-7] Failed to update node information for ClusterInfoUpdateJob within 15s timeout
[2020-06-23T21:26:02,716][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [grtzpse3-7] failed to execute on node [2zD8aPi3REeSYDCm--Nb8Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [grtzpse1-7][192.168.102.166:9300][cluster:monitor/nodes/stats[n]] request_id [983989] timed out after [15010ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1022) [elasticsearch-7.4.2.jar:7.4.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.4.2.jar:7.4.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2020-06-23T21:26:03,811][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse3-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706181] on the node [x73H-wgITUOSQW89fxcl_w], reason: [by user request]
[2020-06-23T21:26:08,764][DEBUG][o.e.a.a.c.n.t.c.TransportCancelTasksAction] [grtzpse3-7] Received ban for the parent [2zD8aPi3REeSYDCm--Nb8Q:1706224] on the node [x73H-wgITUOSQW89fxcl_w], reason: [by user request]

Looks to me node 1 was not responding on the leader check?
Am I reading this correctly?

If so, what can be the cause of this?
Or how can I get more information about whats going one here?

Tnx in advanced!

Peter

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.