Detect failed data node

kay_kay · January 12, 2017, 4:45pm

I have a setup with clients, masters and data nodes (elasticsearch 2.4.1).

When data node has memory problems (i.e. "Caused by: java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects") it tries to restore its state, but failed.

As a result I have a situation when there is a failed data node. Elasticsearch thinks that this data node still works, but it doesn't. Master node has the following logs:

ReceiveTimeoutTransportException[[es-data-4][10.1.53.3:9300][cluster:monitor/nodes/stats[n]] request_id [1101574] timed out after [15000ms]]

So it knows that this node is misfunctional.

But parsing master node is not a proper way to detect that. I'd like to monitor node state using restapi. And restapi shows that everything is fine. So is there a way to detect failed node?

system · February 9, 2017, 4:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster reovery failed and data node is not reachability Elasticsearch	1	211	June 10, 2022
One failed data node cause http connection to master node (6 data nodes) disconnected Elasticsearch	1	568	October 19, 2017
Action [cluster:monitor/nodes/stats[n]] timed out Elasticsearch	9	1569	November 5, 2022
Recreation of ES data node failed Elasticsearch	9	439	October 4, 2020
After restarting the master node, data and client nodes cannot discover the master Elasticsearch	11	1306	July 12, 2023

Detect failed data node

Related topics