Master election problem in 3 node cluster when one died

ehogan · March 3, 2016, 7:56pm

I have a pretty simple ELK stack here with 4 ES nodes (3 data nodes and 1 client node).

I lost one of the data nodes and all of a sudden my Logstash servers couldn't send anything to the ES cluster. I found this error on the LS nodes and on the surviving ES nodes:

{"error":
{"root_cause":
[{"type":"cluster_block_exception",
"reason":"blocked by: [SERVICE_UNAVAILABLE/2/no master];"}],
"type":"cluster_block_exception",
"reason":"blocked by: [SERVICE_UNAVAILABLE/2/no master];"},"status":503}",
:class=>"Elasticsearch::Transport::Transport::Errors::ServiceUnavailable", ....

I am assuming that this is because the ES cluster could no longer elect a master...but I thought that 2 nodes in a 3 node cluster were enough. Did I miss something along the way?

(Sorry if this is basic ES knowledge. I found a posting about someone else running into this problem as well during a rolling upgrade, but there was not a response.)

I am running ES version 2.2.0.

Thanks in advance.

-Emmett

Christian_Dahlqvist · March 3, 2016, 7:59pm

What is minimum_master_nodes set to?

ehogan · March 3, 2016, 8:15pm

Uhhh....crap. I thought that it was automatically computed to (# nodes/2 + 1) if not defined...but now that I read the config...that's not exactly what it says. So...it's actually commented out in my config! Doh!

So...I should have:

discovery.zen.minimum_master_nodes: 2

I am guessing that the right way to update this would be:

Change it on all three nodes
Shut down all my logstash nodes so nothing is getting sent to ES.
Shut down each ES node
Start each ES node

Otherwise, I'll run into the same problem as soon as I restart ES on a node to reload the new config.

Right?

Thanks for your help!

-Emmett

ehogan · March 3, 2016, 8:25pm

While I am changing things in my config...should I also set:

gateway.recover_after_nodes: 2

-Emmett

ehogan · March 3, 2016, 8:44pm

Answering my own question...I found this...

https://www.elastic.co/guide/en/elasticsearch/reference/current/restart-upgrade.html

-E

ehogan · March 3, 2016, 9:47pm

I just noticed something strange though.

I follow the right procedure for restarting my cluster:

Turned off shard reallocation
Bounced the nodes
Turned on shard reallocation

and everything looked fine...except that the last node that I brought up only has replicas on it. No primary shards at all!

-Emmett

warkolm · March 4, 2016, 1:30am

That's nothing to be worried about

Topic		Replies	Views
Need advice to understand cluster behavior Elasticsearch	4	466	September 26, 2018
Elasticsearch nodes work wrong Elasticsearch	4	500	March 27, 2018
Elastic Search restart Elasticsearch	10	971	March 28, 2017
Master node can not be elected After old master process got killed Elasticsearch	3	1378	July 6, 2017
Shutdown master means breakdown the cluster's service? Elasticsearch	8	2332	July 6, 2017

Master election problem in 3 node cluster when one died

Related topics