Cluster green but not responding

Hello Guys,

I am facing this issue right now. My cluster is on green state, but all index and search actions are too slow.

Follow my master last log lines:

[2017-06-21T16:01:58,026][WARN ][o.e.a.a.c.n.s.TransportNodesStatsAction] [SCCHIB4ESCB-10] not accumulating exceptions, excluding exception from response
org.elasticsearch.action.FailedNodeException: Failed node [-d72YxtkQSWtRryyUfhFEA]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:246) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:160) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:218) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1050) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:933) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.1.jar:5.4.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [SCCHIB4ESCB-01][172.17.55.33:9300][cluster:monitor/nodes/stats[n]] request_id [953212] timed out after [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:934) ~[elasticsearch-5.4.1.jar:5.4.1]
        ... 4 more
[2017-06-21T16:02:37,642][WARN ][o.e.t.TransportService   ] [SCCHIB4ESCB-10] Received response for a request that has timed out, sent [54616ms] ago, timed out [39616ms] ago, action [cluster:monitor/nodes/stats[n]], node [{SCCHIB4ESCB-01}{-d72YxtkQSWtRryyUfhFEA}{tgx7hZTgQ8aJBNOxr0IDlQ}{172.17.55.33}{172.17.55.33:9300}], id [953212]
[2017-06-21T16:04:48,025][WARN ][o.e.a.a.c.n.s.TransportNodesStatsAction] [SCCHIB4ESCB-10] not accumulating exceptions, excluding exception from response
org.elasticsearch.action.FailedNodeException: Failed node [-d72YxtkQSWtRryyUfhFEA]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:246) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:160) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:218) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1050) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:933) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.1.jar:5.4.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [SCCHIB4ESCB-01][172.17.55.33:9300][cluster:monitor/nodes/stats[n]] request_id [953538] timed out after [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:934) ~[elasticsearch-5.4.1.jar:5.4.1]
        ... 4 more
[2017-06-21T16:05:48,027][WARN ][o.e.a.a.c.n.s.TransportNodesStatsAction] [SCCHIB4ESCB-10] not accumulating exceptions, excluding exception from response
org.elasticsearch.action.FailedNodeException: Failed node [-d72YxtkQSWtRryyUfhFEA]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:246) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:160) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:218) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1050) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:933) [elasticsearch-5.4.1.jar:5.4.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.1.jar:5.4.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [SCCHIB4ESCB-01][172.17.55.33:9300][cluster:monitor/nodes/stats[n]] request_id [953650] timed out after [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:934) ~[elasticsearch-5.4.1.jar:5.4.1]
        ... 4 more
[2017-06-21T16:05:57,061][WARN ][o.e.t.TransportService   ] [SCCHIB4ESCB-10] Received response for a request that has timed out, sent [84036ms] ago, timed out [69036ms] ago, action [cluster:monitor/nodes/stats[n]], node [{SCCHIB4ESCB-01}{-d72YxtkQSWtRryyUfhFEA}{tgx7hZTgQ8aJBNOxr0IDlQ}{172.17.55.33}{172.17.55.33:9300}], id [953538]
[2017-06-21T16:05:57,061][WARN ][o.e.t.TransportService   ] [SCCHIB4ESCB-10] Received response for a request that has timed out, sent [24034ms] ago, timed out [9034ms] ago, action [cluster:monitor/nodes/stats[n]], node [{SCCHIB4ESCB-01}{-d72YxtkQSWtRryyUfhFEA}{tgx7hZTgQ8aJBNOxr0IDlQ}{172.17.55.33}{172.17.55.33:9300}], id [953650]

And follow my SCCHIB4ESCB-01 last log lines:

[2017-06-21T16:02:54,652][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137449][12289] duration [15.9s], collections [1]/[16.7s], total [15.9s]/[6.9h], memory [6.8gb]->[6.7gb]/[7.8gb], all_pools {[young] [104.5mb]->[56.3mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:02:54,652][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137449] overhead, spent [15.9s] collecting in the last [16.7s]
[2017-06-21T16:03:09,178][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137450][12290] duration [13.8s], collections [1]/[14.5s], total [13.8s]/[6.9h], memory [6.7gb]->[6.7gb]/[7.8gb], all_pools {[young] [56.3mb]->[6.8mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:03:09,178][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137450] overhead, spent [13.8s] collecting in the last [14.5s]
[2017-06-21T16:03:21,523][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137451][12291] duration [11.6s], collections [1]/[12.3s], total [11.6s]/[7h], memory [6.7gb]->[6.7gb]/[7.8gb], all_pools {[young] [6.8mb]->[8.6mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:03:21,523][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137451] overhead, spent [11.6s] collecting in the last [12.3s]
[2017-06-21T16:03:50,894][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137453][12292] duration [28.3s], collections [1]/[28.3s], total [28.3s]/[7h], memory [7.7gb]->[6.7gb]/[7.8gb], all_pools {[young] [990.8mb]->[2.2mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:03:50,895][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137453] overhead, spent [28.3s] collecting in the last [28.3s]
[2017-06-21T16:04:04,931][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137454][12293] duration [13.1s], collections [1]/[14s], total [13.1s]/[7h], memory [6.7gb]->[6.7gb]/[7.8gb], all_pools {[young] [2.2mb]->[23.4mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:04:04,931][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137454] overhead, spent [13.1s] collecting in the last [14s]
[2017-06-21T16:04:21,024][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137456][12294] duration [14.9s], collections [1]/[15s], total [14.9s]/[7h], memory [7.6gb]->[6.7gb]/[7.8gb], all_pools {[young] [890.1mb]->[6.2mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:04:21,024][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137456] overhead, spent [14.9s] collecting in the last [15s]
[2017-06-21T16:04:35,095][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137457][12295] duration [13.1s], collections [1]/[14s], total [13.1s]/[7h], memory [6.7gb]->[6.7gb]/[7.8gb], all_pools {[young] [6.2mb]->[5.3mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:04:35,095][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137457] overhead, spent [13.1s] collecting in the last [14s]
[2017-06-21T16:04:50,000][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137458][12296] duration [13.9s], collections [1]/[14.9s], total [13.9s]/[7h], memory [6.7gb]->[6.9gb]/[7.8gb], all_pools {[young] [5.3mb]->[177.4mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:04:50,000][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137458] overhead, spent [13.9s] collecting in the last [14.9s]
[2017-06-21T16:05:01,934][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][old][137459][12297] duration [11.2s], collections [1]/[11.9s], total [11.2s]/[7h], memory [6.9gb]->[6.8gb]/[7.8gb], all_pools {[young] [177.4mb]->[124mb]/[998.5mb]}{[survivor] [0b]->[0b]/[124.7mb]}{[old] [6.7gb]->[6.7gb]/[6.7gb]}
[2017-06-21T16:05:01,934][WARN ][o.e.m.j.JvmGcMonitorService] [SCCHIB4ESCB-01] [gc][137459] overhead, spent [11.2s] collecting in the last [11.9s]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.