Detect failed data node

I have a setup with clients, masters and data nodes (elasticsearch 2.4.1).

When data node has memory problems (i.e. "Caused by: java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects") it tries to restore its state, but failed.

As a result I have a situation when there is a failed data node. Elasticsearch thinks that this data node still works, but it doesn't. Master node has the following logs:

ReceiveTimeoutTransportException[[es-data-4][10.1.53.3:9300][cluster:monitor/nodes/stats[n]] request_id [1101574] timed out after [15000ms]]

So it knows that this node is misfunctional.

But parsing master node is not a proper way to detect that. I'd like to monitor node state using restapi. And restapi shows that everything is fine. So is there a way to detect failed node?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.