Hello,
In recent weeks, the cluster goes back many errors of "Timeout" while there was no change over.
The cluster is currently composed of 7 nodes (3 master / data / ingest and 4 datas) and is running on version 5.6.7 (the update is expected soon after the correction of this problem)
There are 26 indexes and 131 shards above for a total of around 6 billion documents.
I activated the traces on the "TransportService" and we notice that the response is sent well but too late
Have you got an idea ?
[2020-01-09T09:35:03,403][TRACE][o.e.t.T.tracer ] [node-01] [3417419][cluster:monitor/nodes/info[n]] sent to [{node-04}{RN2FH2eFRZWIUSYgovE04Q}{GN-nWCqkTWiWTcBGMK84bg}{10.0.0.1}{10.0.0.1:9300}] (timeout: [1s])
[2020-01-09T09:35:04,403][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [node-01] failed to execute on node [RN2FH2eFRZWIUSYgovE04Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [node-04][10.0.0.1:9300][cluster:monitor/nodes/info[n]] request_id [3417419] timed out after [1001ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:961) [elasticsearch-5.6.7.jar:5.6.7]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.7.jar:5.6.7]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
[2020-01-09T09:35:05,113][TRACE][o.e.t.T.tracer ] [node-04] [3417419][cluster:monitor/nodes/info[n]] received request
[2020-01-09T09:35:05,114][TRACE][o.e.t.T.tracer ] [node-04] [3417419][cluster:monitor/nodes/info[n]] sent response
[2020-01-09T09:35:05,115][WARN ][o.e.t.TransportService ] [node-01] Received response for a request that has timed out, sent [1713ms] ago, timed out [712ms] ago, action [cluster:monitor/nodes/info[n]], node [{node-04}{RN2FH2eFRZWIUSYgovE04Q}{GN-nWCqkTWiWTcBGMK84bg}{10.0.0.1}{10.0.0.1:9300}], id [3417419]
...
[2020-01-09T09:35:03,866][TRACE][o.e.t.T.tracer ] [node-01] [3417431][cluster:monitor/nodes/info[n]] sent to [{node-04}{RN2FH2eFRZWIUSYgovE04Q}{GN-nWCqkTWiWTcBGMK84bg}{10.0.0.1}{10.0.0.1:9300}] (timeout: [1s])
[2020-01-09T09:35:04,867][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [node-01] failed to execute on node [RN2FH2eFRZWIUSYgovE04Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [node-04][10.0.0.1:9300][cluster:monitor/nodes/info[n]] request_id [3417431] timed out after [1000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:961) [elasticsearch-5.6.7.jar:5.6.7]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.7.jar:5.6.7]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
[2020-01-09T09:35:05,113][TRACE][o.e.t.T.tracer ] [node-04] [3417431][cluster:monitor/nodes/info[n]] received request
[2020-01-09T09:35:05,114][TRACE][o.e.t.T.tracer ] [node-04] [3417431][cluster:monitor/nodes/info[n]] sent response
[2020-01-09T09:35:05,119][WARN ][o.e.t.TransportService ] [node-01] Received response for a request that has timed out, sent [1253ms] ago, timed out [253ms] ago, action [cluster:monitor/nodes/info[n]], node [{node-04}{RN2FH2eFRZWIUSYgovE04Q}{GN-nWCqkTWiWTcBGMK84bg}{10.0.0.1}{10.0.0.1:9300}], id [3417431]