ES total 3 nodes: 18, 10, 15
ES occasionally reports a timeout, help to see what might be the problem? And the cpu and RAM weren't high at the time
This is the elasticsearch server log:
[2024-04-02T17:43:47,318][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve stats for node [zxt4RAOiRZy9Lol9IdIGfg]: [node_2][10.202.152.18:9300][cluster:monitor/nodes/stats[n]] request_id [772476
82] timed out after [15016ms]
[2024-04-02T17:43:47,319][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve shard stats from node [zxt4RAOiRZy9Lol9IdIGfg]: [node_2][10.202.152.18:9300][indices:monitor/stats[n]] request_id [77247
683] timed out after [15016ms]
[2024-04-02T17:43:47,319][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve shard stats from node [h4Igv51yStGQT552LJlITg]: [node_1][10.202.152.10:9300][indices:monitor/stats[n]] request_id [77247
684] timed out after [15016ms]
[2024-04-02T17:43:47,320][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve shard stats from node [WSHeMx5fQbGTE_GtNc2fKg]: [node_3][10.202.152.15:9300][indices:monitor/stats[n]] request_id [77247
685] timed out after [15016ms]
[2024-04-02T17:44:32,321][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve stats for node [WSHeMx5fQbGTE_GtNc2fKg]: [node_3][10.202.152.15:9300][cluster:monitor/nodes/stats[n]] request_id [772476
87] timed out after [15009ms]
[2024-04-02T17:44:32,323][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve stats for node [zxt4RAOiRZy9Lol9IdIGfg]: [node_2][10.202.152.18:9300][cluster:monitor/nodes/stats[n]] request_id [772476
88] timed out after [15009ms]
[2024-04-02T17:44:32,324][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve shard stats from node [zxt4RAOiRZy9Lol9IdIGfg]: [node_2][10.202.152.18:9300][indices:monitor/stats[n]] request_id [77247
689] timed out after [15009ms]
[2024-04-02T17:44:32,324][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve shard stats from node [h4Igv51yStGQT552LJlITg]: [node_1][10.202.152.10:9300][indices:monitor/stats[n]] request_id [77247
690] timed out after [15009ms]
[2024-04-02T17:44:32,324][WARN ][o.e.c.InternalClusterInfoService] [node_2] failed to retrieve shard stats from node [WSHeMx5fQbGTE_GtNc2fKg]: [node_3][10.202.152.15:9300][indices:monitor/stats[n]] request_id [77247
691] timed out after [15009ms]
[2024-04-02T17:44:44,542][WARN ][o.e.t.TransportService ] [node_2] Received response for a request that has timed out, sent [7.2m/432269ms] ago, timed out [6.9m/417260ms] ago, action [indices:monitor/stats[n]], no
de [{node_3}{WSHeMx5fQbGTE_GtNc2fKg}{j0aLmJ_1TnCAxNbdFUmuDw}{10.202.152.15}{10.202.152.15:9300}{cdfhilmrstw}{ml.machine_memory=16656986112, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=1038876672, tra
nsform.node=true}], id [77247566]
[2024-04-02T17:44:44,543][WARN ][o.e.t.TransportService ] [node_2] Received response for a request that has timed out, sent [6.4m/387243ms] ago, timed out [6.2m/372235ms] ago, action [cluster:monitor/nodes/stats[n
]], node [{node_3}{WSHeMx5fQbGTE_GtNc2fKg}{j0aLmJ_1TnCAxNbdFUmuDw}{10.202.152.15}{10.202.152.15:9300}{cdfhilmrstw}{ml.machine_memory=16656986112, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=103887667
2, transform.node=true}], id [77247602]
[2024-04-02T17:44:44,543][WARN ][o.e.t.TransportService ] [node_2] Received response for a request that has timed out, sent [2.7m/162310ms] ago, timed out [2.4m/147294ms] ago, action [cluster:monitor/nodes/stats[n
]], node [{node_3}{WSHeMx5fQbGTE_GtNc2fKg}{j0aLmJ_1TnCAxNbdFUmuDw}{10.202.152.15}{10.202.152.15:9300}{cdfhilmrstw}{ml.machine_memory=16656986112, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=103887667
2, transform.node=true}], id [77247669]
[2024-04-02T17:44:44,585][WARN ][o.e.t.OutboundHandler ] [node_2] sending transport message [Request{indices:monitor/stats[n]}{77247654}{false}{true}{false}] of size [840] on [Netty4TcpChannel{localAddress=/10.20
2.152.18:51856, remoteAddress=10.202.152.10/10.202.152.10:9300, profile=default}] took [297189ms] which is above the warn threshold of [5000ms] with success [true]