hi all,
I have a cluster with three nodes (235, 236 ,237) on windows; kibana installed in all three servers so that each kibana connects to elasticsearch of its server.
sometimes the state of kibana changes from green to red. log of elasticsearch for some cases are as following:
based on .monitoring-kibana index, kibana state changes to red as following.
1- case 1:
kibana red: node-236 in 4-14-2019- 10:43, 10:44
[2019-04-14T10:44:51,329][INFO ][o.e.d.z.ZenDiscovery ] [node-236] master_left [{node-237}{ynuGMeSVRYKsa_-95s0YmA}{o1aAN6vDT1akB5HKXyhZrQ}{0.0.0.237}{0.0.0.237:9300}{ml.machine_memory=8589328384, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2019-04-14T10:44:51,344][WARN ][o.e.d.z.ZenDiscovery ] [node-236] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes: {node-237}{ynuGMeSVRYKsa_-95s0YmA}{o1aAN6vDT1akB5HKXyhZrQ}{0.0.0.237}{0.0.0.237:9300}{ml.machine_memory=8589328384, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, master{node-236}{UWi2vw4-QfqJ_5rDe-j80A}{Ww4jfFFiROu4h-yW-UZLsQ}{0.0.0.236}{0.0.0.236:9300}{ml.machine_memory=8589328384, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local
[2019-04-14T10:44:54,375][WARN ][o.e.d.z.ZenDiscovery ] [node-236] not enough master nodes discovered during pinging (found [[Candidate{node={node-236}{UWi2vw4-QfqJ_5rDe-j80A}{Ww4jfFFiROu4h-yW-UZLsQ}{0.0.0.236}{0.0.0.236:9300}{ml.machine_memory=8589328384, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, clusterStateVersion=67330}]], but needed [2]), pinging again
[2019-04-14T10:44:55,125][WARN ][o.e.d.z.UnicastZenPing ] [node-236] failed to send ping to [{node-237}{ynuGMeSVRYKsa_-95s0YmA}{o1aAN6vDT1akB5HKXyhZrQ}{0.0.0.237}{0.0.0.237:9300}{ml.machine_memory=8589328384, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]org.elasticsearch.transport.ReceiveTimeoutTransportException: [node-237][0.0.0.237:9300][internal:discovery/zen/unicast] request_id [19137729] timed out after [3860ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1038) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.5.4.jar:6.5.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_152]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_152]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_152]
2- case 2:
kibana red : node-236 in 4-14-2019- 10:46
[2019-04-14T10:45:09,316][WARN ][r.suppressed ] [node-236] path: /_xpack/monitoring/_bulk, params: {system_id=kibana, system_api_version=6, interval=10000ms}org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/2/no master];
[2019-04-14T10:45:10,113][INFO ][o.e.c.s.ClusterApplierService] [node-236] detected_master {node-237}{ynuGMeSVRYKsa_-95s0YmA}{o1aAN6vDT1akB5HKXyhZrQ}{0.0.0.237}{0.0.0.237:9300}{ml.machine_memory=8589328384, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, reason: apply cluster state (from master [master {node-237}{ynuGMeSVRYKsa_-95s0YmA}{o1aAN6vDT1akB5HKXyhZrQ}{0.0.0.237}{0.0.0.237:9300}{ml.machine_memory=8589328384, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [67331]])
[2019-04-14T10:46:23,710][WARN ][o.e.m.j.JvmGcMonitorService] [node-236] [gc][old][86350][23] duration [22.9s], collections [1]/[23.7s], total [22.9s]/[36.7s], memory [2.7gb]->[1.5gb]/[3.9gb], all_pools {[young] [90.2mb]->[12.4mb]/[266.2mb]}{[survivor] [25.2mb]->[0b]/[33.2mb]}{[old] [2.6gb]->[1.5gb]/[3.6gb]}
[2019-04-14T10:46:23,710][WARN ][o.e.m.j.JvmGcMonitorService] [node-236] [gc][86350] overhead, spent [23s] collecting in the last [23.7s]
3- case 3:
kibana red : node-237 in 4-14-2019- 01:47,01:48, 01:49
[2019-04-14T01:47:35,887][INFO ][o.e.m.j.JvmGcMonitorService] [node-237] [gc][5306110] overhead, spent [388ms] collecting in the last [1.2s]
[2019-04-14T01:47:54,451][INFO ][o.e.m.j.JvmGcMonitorService] [node-237] [gc][5306128] overhead, spent [315ms] collecting in the last [1.1s]
[2019-04-14T01:48:27,181][WARN ][o.e.t.TransportService ] [node-237] Received response for a request that has timed out, sent [30842ms] ago, timed out [705ms] ago, action [internal:discovery/zen/fd/ping], node [{node-235}{s2_LzCq1TxCWple_fIx9yg}{-q2P7Z8oQcW6d3uiOJaCtA}{0.0.0.235}{0.0.0.235:9300}{ml.machine_memory=8589328384, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [889561226]
[2019-04-14T01:48:28,394][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [node-237] failed to execute on node [s2_LzCq1TxCWple_fIx9yg]org.elasticsearch.transport.ReceiveTimeoutTransportException: [node-235][0.0.0.235:9300][cluster:monitor/nodes/stats[n]] request_id [889579252] timed out after [15166ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1038) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.5.4.jar:6.5.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_152]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_152]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_152]
[2019-04-14T01:48:29,257][WARN ][o.e.t.TransportService ] [node-237] Received response for a request that has timed out, sent [15981ms] ago, timed out [815ms] ago, action [cluster:monitor/nodes/stats[n]], node [{node-235}{s2_LzCq1TxCWple_fIx9yg}{-q2P7Z8oQcW6d3uiOJaCtA}{0.0.0.235}{0.0.0.235:9300}{ml.machine_memory=8589328384, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], id [889579252]
[2019-04-14T01:48:31,084][WARN ][o.e.m.j.JvmGcMonitorService] [node-237] [gc][young][5306164][2494235] duration [1.2s], collections [1]/[1.3s], total [1.2s]/[19.8h], memory [3gb]->[2.8gb]/[3.9gb], all_pools {[young] [255.1mb]->[66.5mb]/[266.2mb]}{[survivor] [29.9mb]->[27.8mb]/[33.2mb]}{[old] [2.7gb]->[2.7gb]/[3.6gb]}
[2019-04-14T01:48:36,126][WARN ][o.e.m.j.JvmGcMonitorService] [node-237] [gc][5306169] overhead, spent [519ms] collecting in the last [1s]
[2019-04-14T01:49:59,288][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [node-237] collector [cluster_stats] timed out when collecting data
any advice will be so appreciated.