Добрый день ! Пожалуйста помогите разобраться, сегодня 2 ноды моего кластера потеряли друг друга и кластер стал не операбелен. Ниже часть лога до и во время события:
[2019-02-28T09:43:52,747][DEBUG][o.e.a.s.TransportSearchAction] [elastic-hot] [graylog_608][3], node[euwPHAKwQLOLNfx0fps02g], [P], s[STARTED], a[id=d8Bhiy6fQeWnz0BTErc__g]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[graylog_608, graylog_609], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_alisases_to_multiple_indices=true, forbid_closed_indices=true], types=[message], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=10, batchedReduceSize=512, preFilterShardSize=64, source={
[2019-02-28T09:43:52,755][DEBUG][o.e.a.s.TransportSearchAction] [elastic-hot] [graylog_609][3], node[euwPHAKwQLOLNfx0fps02g], [P], s[STARTED], a[id=_eLHFO5qSeihsSPWdUL8UA]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[graylog_608, graylog_609], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_alisases_to_multiple_indices=true, forbid_closed_indices=true], types=[message], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=10, batchedReduceSize=512, preFilterShardSize=64, source={
[2019-02-28T09:43:52,757][DEBUG][o.e.a.s.TransportSearchAction] [elastic-hot] All shards failed for phase: [query]
[2019-02-28T09:43:58,825][INFO ][o.e.m.j.JvmGcMonitorService] [elastic-hot] [gc][58996] overhead, spent [251ms] collecting in the last [1s]
[2019-02-28T09:44:27,127][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [elastic-hot] failed to execute on node [vMZI77LoTyuHlEH-xVvJJQ]
[2019-02-28T09:44:43,845][INFO ][o.e.m.j.JvmGcMonitorService] [elastic-hot] [gc][59041] overhead, spent [420ms] collecting in the last [1s]
[2019-02-28T09:45:27,130][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [elastic-hot] failed to execute on node [vMZI77LoTyuHlEH-xVvJJQ]
[2019-02-28T09:45:30,320][INFO ][o.e.c.r.a.AllocationService] [elastic-hot] Cluster health status changed from [GREEN] to [RED] (reason: []).
[2019-02-28T09:45:30,321][INFO ][o.e.c.s.ClusterService ] [elastic-hot] removed {{elastic-warm}{vMZI77LoTyuHlEH-xVvJJQ}{hIKLM1muRAOjZEbfWjNYaA}{192.168.150.108}{192.168.150.108:9300}{box_type=warm},}, reason: zen-disco-node-failed({elastic-warm}{vMZI77LoTyuHlEH-xVvJJQ}{hIKLM1muRAOjZEbfWjNYaA}{192.168.150.108}{192.168.150.108:9300}{box_type=warm}), reason(failed to ping, tried [3] times, each with maximum [30s] timeout)
[2019-02-28T09:45:31,338][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [elastic-hot] failed to execute [indices:monitor/stats] on node [vMZI77LoTyuHlEH-xVvJJQ]
[2019-02-28T09:45:31,339][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [elastic-hot] failed to execute [indices:monitor/stats] on node [vMZI77LoTyuHlEH-xVvJJQ]
[2019-02-28T09:45:31,347][INFO ][o.e.c.r.DelayedAllocationService] [elastic-hot] scheduling reroute for delayed shards in [58.8s] (64 delayed shards)
[2019-02-28T09:45:53,089][INFO ][o.e.c.s.ClusterService ] [elastic-hot] added {{elastic-warm}{vMZI77LoTyuHlEH-xVvJJQ}{hIKLM1muRAOjZEbfWjNYaA}{192.168.150.108}{192.168.150.108:9300}{box_type=warm},}, reason: zen-disco-node-join
[2019-02-28T09:46:23,109][WARN ][o.e.d.z.PublishClusterStateAction] [elastic-hot] timed out waiting for all nodes to process published state [82] (timeout [30s], pending nodes: [{elastic-warm}{vMZI77LoTyuHlEH-xVvJJQ}{hIKLM1muRAOjZEbfWjNYaA}{192.168.150.108}{192.168.150.108:9300}{box_type=warm}])
[2019-02-28T09:46:23,156][WARN ][o.e.d.z.ElectMasterService] [elastic-hot] value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [1], total number of master-eligible nodes used for publishing in this round: [2])
[2019-02-28T09:46:23,157][WARN ][o.e.c.s.ClusterService ] [elastic-hot] cluster state update task [zen-disco-node-join] took [30s] above the warn threshold of 30s
[2019-02-28T09:46:53,173][WARN ][o.e.d.z.PublishClusterStateAction] [elastic-hot] timed out waiting for all nodes to process published state [83] (timeout [30s], pending nodes: [{elastic-warm}{vMZI77LoTyuHlEH-xVvJJQ}{hIKLM1muRAOjZEbfWjNYaA}{192.168.150.108}{192.168.150.108:9300}{box_type=warm}])
[2019-02-28T09:46:53,253][WARN ][o.e.c.s.ClusterService ] [elastic-hot] cluster state update task [cluster_reroute(async_shard_fetch)] took [30s] above the warn threshold of 30s
[2019-02-28T09:47:46,578][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [elastic-hot] failed to execute on node [vMZI77LoTyuHlEH-xVvJJQ]
[2019-02-28T09:47:51,214][WARN ][o.e.d.z.PublishClusterStateAction] [elastic-hot] timed out waiting for all nodes to process published state [86] (timeout [30s], pending nodes: [{elastic-warm}{vMZI77LoTyuHlEH-xVvJJQ}{hIKLM1muRAOjZEbfWjNYaA}{192.168.150.108}{192.168.150.108:9300}{box_type=warm}])
[2019-02-28T09:47:51,348][WARN ][o.e.c.s.ClusterService ] [elastic-hot] cluster state update task [shard-started shard id [[graylog_605][2]], allocation id [GpHfybP4T4O-SdUhuRJSwA], primary term [0], message [after existing recovery], shard-started shard id [[graylog_605][0]], allocation id [PkQ1r-llQcW5KbHChhZvRQ], primary term [0], message [after existing recovery], shard-started shard id [[graylog_605][2]], allocation id [GpHfybP4T4O-SdUhuRJSwA], primary term [0], message [master {elastic-hot}{euwPHAKwQLOLNfx0fps02g}{hlIxMvyMTxO1wrpiZG4ymA}{192.168.150.109}{192.168.150.109:9300}{box_type=hot} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started], shard-started shard id [[graylog_605][1]], allocation id [gwhVa9NbQ0eHa9TPpakI1A], primary term [0], message [after existing recovery], shard-started shard id [[graylog_605][1]], allocation id [gwhVa9NbQ0eHa9TPpakI1A], primary term [0], message [master {elastic-hot}{euwPHAKwQLOLNfx0fps02g}{hlIxMvyMTxO1wrpiZG4ymA}{192.168.150.109}{192.168.150.109:9300}{box_type=hot} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started], shard-started shard id [[graylog_605][0]], allocation id [PkQ1r-llQcW5KbHChhZvRQ], primary term [0], message [master {elastic-hot}{euwPHAKwQLOLNfx0fps02g}{hlIxMvyMTxO1wrpiZG4ymA}{192.168.150.109}{192.168.150.109:9300}{box_type=hot} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]] took [30.1s] above the warn threshold of 30s