5 node cluster breaks when master is shut down

Stuart_Cracraft · January 12, 2016, 7:34pm

The elasticsearch.yml on the master is:

cluster.name: elasticsearchlogstashkibana
node.name: "elksrv1"
node.master: true
node.data: true
index.number_of_replicas: 2
index.number_of_shards: 5
indices.recovery.compress: false

The 4 other nodes are as above but:

cluster.name: elasticsearchlogstashkibana
node.name: "elksrv[2-4]"
node.master: false
node.data: true
index.number_of_replicas: 2
index.number_of_shards: 5
indices.recovery.compress: false

What are the settings to permit resiliency of master so that when it crashes or service is taken down the cluster survives?

magnusbaeck · January 12, 2016, 7:38pm

Don't have a single master. For a five node cluster it's most likely unnecessary and wasteful to have a dedicated master, especially since it becomes a single point of failure, so just make all data nodes master-eligible and drop the current dedicated master. If you insist on dedicated masters you need three of them.

Stuart_Cracraft · January 12, 2016, 7:39pm

So the other nodes should have node.master: true instead of the present node.master: false? I.e. all nodes have node.master: true?

magnusbaeck · January 12, 2016, 7:40pm

Oh, and make sure you set discovery.zen.minimum_master_nodes to N/2+1, i.e. 2 for three node clusters and 3 for four och five node clusters.

Stuart_Cracraft · January 12, 2016, 7:41pm

So I am hearing this:

cluster.name: elasticsearchlogstashkibana
node.name: "elksrv[1-5]"
node.master: true
node.data: true
index.number_of_replicas: 2
index.number_of_shards: 5
indices.recovery.compress: false
discovery.zen.minimum_master_nodes: 3

for all.

magnusbaeck · January 12, 2016, 7:50pm

Yes. And you'll probably want to have a five node cluster since a four node cluster won't be able to survive two nodes being down (which I guess is the point of having two replicas?). If your current dedicated master isn't powerful enough to be a data node you can keep it as a pure master node, but that doesn't mean it'll actually be elected master.

Stuart_Cracraft · January 12, 2016, 9:46pm

Should discovery.zen.ping.unicast.hosts: have the list of all the nodes, nothing, or something else?

magnusbaeck · January 13, 2016, 4:31am

You don't have to list all the nodes, but you need to list enough nodes so that the cluster will be able to form even if some of the nodes are down. In your case you'll want to list at least three nodes since two can be out of service. However, since you should be managing your config files with a configuration management tool that can generate files based on templating it might be just as easy to list all nodes.

Stuart_Cracraft · January 13, 2016, 11:02pm

Okay - this seems to work, mostly, for each of the 5 node's elasticsearch.yml
(but see the concern below the config):
cluster.name: elasticsearchlogstashkibana
node.name: "elksrv5"
node.master: true
node.data: true
index.number_of_shards: 5
index.number_of_replicas: 2
indices.recovery.compress: false
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["elksrv1.channel-corp.com","elksrv2.channel-corp.com","elksrv3.channel-corp.com","elksrv4.channel-corp.com","elksrv5.channel-corp.com"]
script.engine.groovy.inline.update: on

When I test this by taking the non-master's down, no problem. We go yellow then
green after it automatically reassigns a moderate number of shards. The process takes perhaps a minute or two.

When I take this by taking the master down, problem. We go to red and stay red with unassigned shard count (small usually - tested on two nodes) When I turn the now non-master (election to a new master does take place), the small unassigned shard count goes back to zero after a minute and it goes from red to yellow to green.

But the cluster does not recover to yellow or green from a down master, unless I am missing something.

magnusbaeck · January 14, 2016, 7:23am

When I take this by taking the master down, problem. We go to red and stay red with unassigned shard count (small usually - tested on two nodes) When I turn the now non-master (election to a new master does take place), the small unassigned shard count goes back to zero after a minute and it goes from red to yellow to green.

So you're saying that shards stay unassigned even though a new master is elected and there is at least one replica of the shards in question? That's unexpected. Are there any clues in the ES logs?

Christian_Dahlqvist · January 14, 2016, 8:08am

Which version of Elasticsearch are you using? Are all nodes in the cluster the same version?

Stuart_Cracraft · January 15, 2016, 6:31pm

1 is running 1.2.2.

The others are 1.6.0

Is this a problem?

How do I upgrade 1.2.2 to 1.6.0.

Thanks ahead.

Christian_Dahlqvist · January 15, 2016, 6:39pm

Yes, that is a problem. Different versions use different Lucene versions, so when a shard has been upgraded on one of the newer instances, it can no longer be reallocated to the older node. There may also be other issues depending on which versions are use, so all nodes in a cluster should always be the same version.

Instructions for upgrading can be found here.

Topic		Replies	Views
5 node cluster setup for elk Elasticsearch	2	1595	May 25, 2019
Multiple indexes break elasticsearch (2.3.1) cluster replication? Elasticsearch	5	395	January 18, 2019
3 nodes in the cluster, 2 data and 1 master - why if one fails it takes the whole cluster down? Elasticsearch	15	4979	April 12, 2021
Master Vs Data Node selection Elasticsearch	4	330	December 21, 2020
ElasticSearch 2-out-of-4 Master Replica goes down Elasticsearch	8	246	November 3, 2022

5 node cluster breaks when master is shut down

Related topics