All shards failed after one node shutdown


#1

Can you help me figure out why all shards failed?
3 nodes cluster, the master was shut down.
Couldn't send query APIs anymore.
Maybe I should modify the timeouts in the yml file?
Working with ES 5.4


[2017-12-20T19:59:05,638][INFO ][o.e.n.Node               ] [node-1] started
[2017-12-20T20:02:19,538][INFO ][o.e.d.z.ZenDiscovery     ] [node-1] master_left [{node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria-5wdaBe1aw}{172.
16.65.117}{172.16.65.117:9300}], reason [shut_down]
[2017-12-20T20:02:19,542][WARN ][o.e.d.z.ZenDiscovery     ] [node-1] master left (reason = shut_down), current nodes: nodes: 
   {node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria-5wdaBe1aw}{172.16.65.117}{172.16.65.117:9300}, master
   {node-1}{OP5NYOkVRvmZiPPh9Yl_eA}{aLZlwCxZTYeN9L-qWLFTgQ}{172.16.65.114}{172.16.65.114:9300}, local
   {node-3}{iKDO2up0Q-K4O3D8mfH9Pg}{MBkOeCWdRXW3eESCMQnM6A}{172.16.67.71}{172.16.67.71:9300}

[2017-12-20T20:02:19,546][WARN ][o.e.c.NodeConnectionsService] [node-1] failed to connect to node {node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria
-5wdaBe1aw}{172.16.65.117}{172.16.65.117:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [node-2][172.16.65.117:9300] connect_timeout[1s]

[2017-12-20T20:02:22,557][INFO ][o.e.c.s.ClusterService   ] [node-1] new_master {node-1}{OP5NYOkVRvmZiPPh9Yl_eA}{aLZlwCxZTYeN9L-qWLFTgQ}{172.16.65.114}{172.16.65.114:9300}, reason: zen-disco-elected-as-master ([1] nodes joined)[{node-3}{iKDO2up0Q-K4O3D8mfH9Pg}{MBkOeCWdRXW3eESCMQnM6A}{172.16.67.71}{172.16.67.71:9300}]
[2017-12-20T20:02:22,564][WARN ][o.e.c.a.s.ShardStateAction] [node-1] [events_1513798599108][0] received shard failed for shard id [[events_1513798599108][0]], allocation id [1oqyw1pyS56-rrPolFUFqQ], primary term [0], message [master marked shard as active, but shard has not been created, mark shard as failed]
[2017-12-20T20:02:22,638][WARN ][o.e.i.c.IndicesClusterStateService] [node-1] [[events_1513798599108][0]] marking and sending shard failed due to [master marked shard as active, but shard has not been created, mark shard as failed]
[2017-12-20T20:02:22,638][WARN ][o.e.c.a.s.ShardStateAction] [node-1] [events_1513798599108][0] received shard failed for shard id [[events_1513798599108][0]], allocation id [FMBfOaJHRkCPPoAuruInnw], primary term [0], message [master marked shard as active, but shard has not been created, mark shard as failed]
[2017-12-20T20:02:22,714][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [node-1] failed to execute on node [1S06dYPBQV6PXmofJkFkJQ]
org.elasticsearch.transport.NodeNotConnectedException: [node-2][172.16.65.117:9300] Node not connected

[2017-12-20T20:02:22,732][INFO ][o.e.c.r.a.AllocationService] [node-1] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[events_1513798599108][0], [events_1513798599108][0]] ...]).
[2017-12-20T20:02:22,811][INFO ][o.e.c.r.a.AllocationService] [node-1] Cluster health status changed from [YELLOW] to [RED] (reason: [{node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria-5wdaBe1aw}{172.16.65.117}{172.16.65.117:9300} transport disconnected, {node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria-5wdaBe1aw}{172.16.65.117}{172.16.65.117:9300} transport disconnected]).
[2017-12-20T20:02:22,812][INFO ][o.e.c.s.ClusterService   ] [node-1] removed {{node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria-5wdaBe1aw}{172.16.65.117}{172.16.65.117:9300},}, reason: zen-disco-node-failed({node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria-5wdaBe1aw}{172.16.65.117}{172.16.65.117:9300}), reason(transport disconnected)[{node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria-5wdaBe1aw}{172.16.65.117}{172.16.65.117:9300} transport disconnected, {node-2}{1S06dYPBQV6PXmofJkFkJQ}{yeYNvi1wSria-5wdaBe1aw}{172.16.65.117}{172.16.65.117:9300} transport disconnected]
[2017-12-20T20:02:22,882][INFO ][o.e.c.r.DelayedAllocationService] [node-1] scheduling reroute for delayed shards in [59.9s] (1 delayed shards)
[2017-12-20T20:02:23,626][DEBUG][o.e.a.s.TransportSearchAction] [node-1] All shards failed for phase: [query]

timeouts settings:


discovery.zen.commit_timeout: 2s
discovery.zen.publish_timeout: 2s
discovery.zen.fd.ping_timeout: 1s
transport.tcp.connect_timeout: 1s

#2

This issue is a blocker for us since we don't know how to recover. Any help please.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.