Catch-22 with minimum_master_nodes setting


#1

In my three node cluster I have minimum_master_nodes set to 2. Occasionally I need to drop down to 1 node for maintenance purposes. If I forget to set the minimum back to 1 before doing this then I end up stuck. Attempts to set minimum_master_nodes to 1 after dropping down to 1 node result in:

curl -XPUT http://127.0.0.1:9500/_cluster/settings -d '{ "persistent" : { "discovery.zen.minimum_master_nodes" : 1} }'

{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

Is there anyway to force this setting through?


(Magnus Bäck) #2

What if you change that setting before you take down two of the three nodes?

I don't get why you have to take two nodes offline at the same time. Unless you have two replicas the cluster would become red anyway.


#3

the reason why doesn't matter, the question is how to get out of this situation, it seems a bit poor that you can lock yourself out of the product like this if you forget to reduce min_master_nodes before reducing number of nodes. As I understand it the persistent settings take precedence over what's in the yml file, so you can't even set it in the yml and restart.


#4

just an additional note, our customer isn't taking two nodes down at the same time, but one after the other - so the shards re-organize after the first node is removed and before the second is removed, then we end up with one node in a yellow state.

It seems the only solution is to add back another node to cluster with it, but the customer may not always have another elasticsearch system available to do this.


(Magnus Bäck) #5

the reason why doesn't matter, the question is how to get out of this situation, it seems a bit poor that you can lock yourself out of the product like this if you forget to reduce min_master_nodes before reducing number of nodes.

You can always spin up a temporary master-only instance to get out it. That instance can run on an existing machine.

just an additional note, our customer isn't taking two nodes down at the same time, but one after the other - so the shards re-organize after the first node is removed and before the second is removed, then we end up with one node in a yellow state.

The customer should not to take down a node unless the cluster is green.


#6

Thanks Magnus, so spinning up a temporary system is the only way out of this. I can't reveal details of our product and how it works on public forums like this, but this is a very real scenario for us (it may sound daft, but there are valid reasons for it). I can put this advice on spinning up a temp system in our troubleshooting instructions.


(Jörg Prante) #7

You can easily work around that issue by slightly changing your cluster setup. The idea is to decouple master nodes from data nodes.

Set up three master nodes (no client, no HTTP, no data - very small resource consumption) and set minimum master nodes to 3. Keep an eye on these nodes. They manage the cluster wide state and must always be up. Do not play with minimum master node setting. In case of having to remove a master node, add a new master node before removing the one you want to decommission.

With the master nodes, you can add or remove client (HTTP) / data nodes as many as you wish, and in the order you wish.


#8

Thanks for taking the time to give the extra advice Jorg, unfortunately in the product I work on which embeds elasticsearch this isn't viable. We're actually breaking many of the ES recommendations, but it still works awesomely 99.999% of the time. I just needed to know if there was a way to 'break through' and reset min_master_nodes to 1 in the scenario I described.


(system) #9