I have a setup with clients, masters and data nodes (elasticsearch 2.4.1).
When data node has memory problems (i.e. "Caused by: java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects") it tries to restore its state, but failed.
As a result I have a situation when there is a failed data node. Elasticsearch thinks that this data node still works, but it doesn't. Master node has the following logs:
ReceiveTimeoutTransportException[[es-data-4][10.1.53.3:9300][cluster:monitor/nodes/stats[n]] request_id [1101574] timed out after [15000ms]]
So it knows that this node is misfunctional.
But parsing master node is not a proper way to detect that. I'd like to monitor node state using restapi. And restapi shows that everything is fine. So is there a way to detect failed node?