Elasticsearch cluster crashed when 1 node got IO issues

Hi Everyone,

currently I'm using ES 6.6.1, in 48 nodes cluster, replication factor 1, and faced an issue when 1 of nodes had IO issues - whole cluster got to Red state, all indices got red, couldn't execute _cat/nodes. In logs got lots of errors: [2019-11-16T19:50:59,398][WARN ][r.suppressed ] [es1-master-01-...] path: /.kibana/doc/kql-telemetry%3Akql-telemetry, params: {index=.kibana, id=kql-telemetry:kql-telemetry, type=doc}

And errors were not connected to node which actually failed

Only removing sick node helped cluster to recover. Is it any way to tell Elasticsearch to remove more from cluster if it doesn't respond for a while?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.