Hi Everyone,
currently I'm using ES 6.6.1, in 48 nodes cluster, replication factor 1, and faced an issue when 1 of nodes had IO issues - whole cluster got to Red state, all indices got red, couldn't execute _cat/nodes. In logs got lots of errors: [2019-11-16T19:50:59,398][WARN ][r.suppressed ] [es1-master-01-...] path: /.kibana/doc/kql-telemetry%3Akql-telemetry, params: {index=.kibana, id=kql-telemetry:kql-telemetry, type=doc}
And errors were not connected to node which actually failed
Only removing sick node helped cluster to recover. Is it any way to tell Elasticsearch to remove more from cluster if it doesn't respond for a while?