ElasticSearch 2.1.0 goes 100% CPU, all shards offline


(Omar Al Zabir) #1

I have ElasticSearch 2.1.0 running in 3 nodes. 1 node goes 100% CPU, all shards offline on it, elasticsearch process takes 156% CPU on top and so on. Server are all up and running.

I see this in the log:

[2015-12-07 02:12:36,411][INFO ][cluster.service          ] [ec-rdl-2] removed {{ec-dyl-3}{ezyh-vmeTYegJsYs1417XQ}{
[2015-12-07 02:12:39,474][INFO ][cluster.service          ] [ec-rdl-2] detected_master {ec-dyl-3}{ezyh-vmeTYegJsYs1
[2015-12-07 02:18:26,410][INFO ][index.engine             ] [ec-rdl-2] [topbeat-2015.12.07][3] now throttling index
[2015-12-07 02:18:26,461][INFO ][index.engine             ] [ec-rdl-2] [topbeat-2015.12.07][3] stop throttling inde
[2015-12-07 03:12:39,160][INFO ][discovery.zen            ] [ec-rdl-2] master_left [{ec-dyl-3}{ezyh-vmeTYegJsYs1417
[2015-12-07 03:12:39,161][WARN ][discovery.zen            ] [ec-rdl-2] master left (reason = transport disconnected
[2015-12-07 03:12:39,161][INFO ][cluster.service          ] [ec-rdl-2] removed {{ec-dyl-3}{ezyh-vmeTYegJsYs1417XQ}{
[2015-12-07 03:12:42,191][INFO ][cluster.service          ] [ec-rdl-2] detected_master {ec-dyl-3}{ezyh-vmeTYegJsYs1
[2015-12-07 03:20:23,504][INFO ][discovery.zen            ] [ec-rdl-2] master_left [{ec-dyl-3}{ezyh-vmeTYegJsYs1417
[2015-12-07 03:20:23,505][WARN ][discovery.zen            ] [ec-rdl-2] master left (reason = failed to ping, tried
  [2015-12-07 03:25:32,636][INFO ][index.engine             ] [ec-rdl-2] [topbeat-2015.12.07][0] now throttling index
[2015-12-07 03:25:32,638][INFO ][index.engine             ] [ec-rdl-2] [topbeat-2015.12.07][0] stop throttling inde
[2015-12-07 03:25:56,545][INFO ][index.engine             ] [ec-rdl-2] [topbeat-2015.12.07][0] now throttling index
[2015-12-07 03:25:56,561][INFO ][index.engine             ] [ec-rdl-2] [topbeat-2015.12.07][0] stop throttling inde
[2015-12-07 04:20:26,817][INFO ][discovery.zen            ] [ec-rdl-2] master_left [{ec-dyl-3}{ezyh-vmeTYegJsYs1417
[2015-12-07 04:20:26,818][WARN ][discovery.zen            ] [ec-rdl-2] master left (reason = transport disconnected
[2015-12-07 04:20:26,818][INFO ][cluster.service          ] [ec-rdl-2] removed {{ec-dyl-3}{ezyh-vmeTYegJsYs1417XQ}{
[2015-12-07 04:20:29,853][INFO ][cluster.service          ] [ec-rdl-2] detected_master {ec-dyl-3}{ezyh-vmeTYegJsYs1
[2015-12-07 04:26:54,869][INFO ][discovery.zen            ] [ec-rdl-2] master_left [{ec-dyl-3}{ezyh-vmeTYegJsYs1417
[2015-12-07 04:26:54,870][WARN ][discovery.zen            ] [ec-rdl-2] master left (reason = failed to ping, tried
[2015-12-07 04:26:54,870][INFO ][cluster.service          ] [ec-rdl-2] removed {{ec-dyl-3}{ezyh-vmeTYegJsYs1417XQ}{
[2015-12-07 04:26:57,935][INFO ][cluster.service          ] [ec-rdl-2] detected_master {ec-dyl-3}{ezyh-vmeTYegJsYs1
[2015-12-07 04:31:36,777][INFO ][index.engine             ] [ec-rdl-2] [topbeat-2015.12.07][0] now throttling index
[2015-12-07 04:31:36,805][INFO ][index.engine             ] [ec-rdl-2] [topbeat-2015.12.07][0] stop throttling inde
[2015-12-07 05:26:58,719][INFO ][discovery.zen            ] [ec-rdl-2] master_left [{ec-dyl-3}{ezyh-vmeTYegJsYs1417
[2015-12-07 05:26:58,719][WARN ][discovery.zen            ] [ec-rdl-2] master left (reason = transport disconnected
[2015-12-07 05:26:58,720][INFO ][cluster.service          ] [ec-rdl-2] removed {{ec-dyl-3}{ezyh-vmeTYegJsYs1417XQ}{
[2015-12-07 05:26:58,734][WARN ][discovery.zen.ping.unicast] [ec-rdl-2] failed to send ping to [{ec-dyl-3}{ezyh-vme
SendRequestTransportException[[ec-dyl-3][10.35.76.37:9300][internal:discovery/zen/unicast]]; nested: NodeNotConnect
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:44   

[2015-12-07 09:52:50,926][INFO ][rest.suppressed ] /_bulk Params: {}
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
[2015-12-07 18:12:59,333][DEBUG][action.admin.indices.stats] [ec-rdl-2] [indices:monitor/stats] failed to execute o
[logstash-vasfulfilmenthelpdesk-helpdesklogs-2015.12.07][[logstash-vasfulfilmenthelpdesk-helpdesklogs-2015.12.07][0
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportR

[2015-12-07 18:13:04,391][DEBUG][action.admin.indices.stats] [ec-rdl-2] [indices:monitor/stats] failed to execute o
[logstash-vasfulfilmenthelpdesk-helpdesklogs-2015.12.07][[logstash-vasfulfilmenthelpdesk-helpdesklogs-2015.12.07][1
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportR


(Vincent Tran) #2

I don't see any error in this log. It simply says that the master node has gone offline. What exactly do you see that is of interest here (related to 100% CPU?) Since your master has left, your cluster won't be green until a new master is elected. What is your discovery.zen.minimum_master_nodes set at?


(Omar Al Zabir) #3

I do not have it set to anything explicitly.


(system) #4