Timeline part 2
node 143
13:03:57,790][WARN ][o.e.d.z.ZenDiscovery ] [i-node1] zen-disco-failed-to-publish, current nodes: nodes:
i-node1, local, master
i-node2
i-node3
node 143
13:03:59,797][INFO ][o.e.d.e.AwsEc2UnicastHostsProvider] [i-node1] Exception while retrieving instance list from AWS API: Unable to load credentials from Amazon EC2 metadata service
node 143
May 5 13:03:59 ip-10-176-252-143 kernel: [5112732.193843] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
node 143
13:04:16,145][WARN ][o.e.d.z.ZenDiscovery ] [i-node1] not enough master nodes, current nodes: nodes:
i-node1, local
i-node2
i-node3
node 209
13:05:26,103][INFO ][o.e.c.s.ClusterService ] [i-node3] detected_master i-node2, reason: zen-disco-receive(from master [master i-node2 committed version [924723]])
node 24
13:05:26,573][INFO ][o.e.c.r.a.AllocationService] [i-node2] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[type3-w2018.18-16][1], [type1-m2018.05][1], [type2-w2018.18][1], [type3-w2018.18][1], [type3-w2018.18-16][1], [type3-w2018.18][1], [type3-w2018.18][1], [type2-w2018.18-16][0], [type2-w2018.18-4][2], [type3-w2018.18][1]] ...]).
node 143
2018-05-05T13:05:32,407][INFO ][o.e.d.z.ZenDiscovery ] [i-node1] master_left [i-node2], reason [transport disconnected]
node 209
13:06:02,908][INFO ][o.e.c.s.ClusterService ] [i-node3] removed {i-node1,}, reason: zen-disco-receive(from master [master i-node2 committed version [924728]])
node 24
13:06:03,919][WARN ][o.e.c.a.s.ShardStateAction] [i-node2] [type2-w2018.18-16][2] received shard failed for shard id [[type2-w2018.18-16][2]], allocation id [HBVZHVz3R0q0rfKFA5e_9Q], primary term [2], message [mark copy as stale]
node 143
May 5 13:07:35 ip-10-176-252-143 kernel: [5112948.732099] Detected Tx Unit Hang
node 24
13:17:39,480][INFO ][o.e.c.s.ClusterService ] [i-node2] added {i-node1,}, reason: zen-disco-node-join[i-node1]
node 143
13:17:39,508][INFO ][o.e.c.s.ClusterService ] [i-node1] detected_master i-node2, reason: zen-disco-receive(from master [master i-node2 committed version [924787]])
node 24
13:17:46,267][WARN ][o.e.c.a.s.ShardStateAction] [i-node2] [type1-m2018.05][1] received shard failed for shard id [[type1-m2018.05][1]], allocation id [k1KOGv8sTemz5sx37LWMzA], primary term [0], message [failed to create shard], failure [IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[type1-m2018.05][1]: obtaining shard lock timed out after 5000ms]; ]
java.io.IOException: failed to obtain in-memory shard lock
node 24
13:17:46,411][WARN ][o.e.c.a.s.ShardStateAction] [i-node2] [type1-m2018.05][1] received shard failed for shard id [[type1-m2018.05][1]], allocation id [k1KOGv8sTemz5sx37LWMzA], primary term [0], message [master i-node2 has not removed previously failed shard. resending shard failure]
node 143
13:17:46,266][WARN ][o.e.i.c.IndicesClusterStateService] [i-node1] [[type1-m2018.05][1]] marking and sending shard failed due to [failed to create shard]
java.io.IOException: failed to obtain in-memory shard lock
node 143
May 5 13:18:03 ip-10-176-252-143 kernel: [5113576.444117] Detected Tx Unit Hang
node 143
13:21:24,349][WARN ][o.e.i.c.IndicesClusterStateService] [i-node1] [[type2-w2018.18-64][2]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [type2-w2018.18-64][2]: Recovery failed from i-node2 into i-node1
node 24
13:24:31,780][WARN ][o.e.c.a.s.ShardStateAction] [i-node2] [type2-w2018.18-64][2] received shard failed for shard id [[type2-w2018.18-64][2]], allocation id [m2_66-FQT0K5c5Ns0PHMEA], primary term [0], message [failed recovery], failure [RecoveryFailedException[[type2-w2018.18-64][2]: Recovery failed from i-node2 into i-node1]; nested: RemoteTransportException[[i-node2][node2:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]];nested: IllegalStateException[try to recover [type2-w2018.18-64][2] from primary shard with sync id but number of docs differ: 255794 (i-node2, primary) vs 255782(i-node1)]; ]