Cluster locks up if master node filesystem becomes read-only

There are two solutions.

First one is ES-only. The reason why ES is not shutting down automatically is because org.elasticsearch.env.NodeEnviroment keeps a java.nio.file.FileStore which is never monitored by calling isReadOnly() method regularly.

The java.nio.file.FileStore of all writable paths would have to be monitored for emergency stop in such kind of event.

To make the cluster drop a node with readonly file store, you would have modify the code and submit a patch.

The second solution: to detect general hardware malfunctions, JVM-based methods are quite not sufficient. Hence, ES is the not the best place to implement that, but the OS. You have to set up server monitoring software which can understand SNMP or IPMI or triggers for mcelog https://github.com/andikleen/mcelog that can kill ES (and other) processes in case of severe events.