MasterNotDiscoveredException


(LeFinc) #1

Hi everybody,

I've been running a 2-node elasticsearch cluster for the last couple of months with a total of around 1TB of indexed data.

Today, during heavy load, both clusters became unresponsive. I brought Node 1 down with a kill (perhaps not a good thing to do in hindsight)

Node 2 is not responding to kill; although the server no longer listens on 9200, the process is hanging on even though no disk I/O appears to be taking place.

I can bring Node 1 back up, so that 9200 responds again with the usual message. However, when checking for cluster health, I get the following error:

http://192.168.1.69:9200/_cluster/health
=>
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

and test search gives me
http://192.168.1.69:9200/_search?pretty=1
=>
{
"error" : "ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];]",
"status" : 503
}

Is there anything I can do to recover my data or have I messed this up for good? At the moment, neither node responds to shutdown via Shutdown API either.

Thanks for your help!


(system) #2