Force a node to be the master

ebuildy · January 18, 2021, 7:40pm

Just curious, how to fix a broken cluster like this:

Via official Helm release, I made a 3 nodes cluster, was working fine:

node-0 / node-1 / node-2

I scaled down the STS to 0, to remove all pods/nodes.

Then when I scale up, the first pod (node-0) cannot elect a master, because the other pods (node-1/node-2) are not up (which is the normal behavior, to prevent split brain ES need a quorum, I am ok):

master not discovered or elected yet, an election requires at least 2 nodes with ids from [jyUoUGcASqO8kQMrjNWSlQ, W7CrWAy-SuOtiWPy9C7eHw, ujsS2swvTzGQcEv__DoNcQ],

Without recreating the other node (lets say servers have burnt!), how can I "force" this alone node to become the master?

I tried voting_config_exclusions but this API requires ... an elected master. (master_not_discovered_exception).

DavidTurner · January 18, 2021, 8:44pm

If you only have one out of three nodes remaining then you have lost data: the cluster state is only stored on a majority of the master-eligible nodes, which might be the missing two. The solution is to form a new cluster and restore a recent snapshot.

ebuildy · January 18, 2021, 9:00pm

Do'h!

So in a real situation in production, if a split-brain occurs, we lost the cluster and data

Why not an API "/_cluster/state/master" to set master so we could try to recover without losing data?

DavidTurner · January 18, 2021, 9:05pm

No, because in a real situation in production you have snapshots from which to recover. Simultaneously losing two of your three masters should be extraordinarily rare.

You can't solve this with an API: you already lost data when the second master died.

ebuildy · January 18, 2021, 9:13pm

well ok, thanks you.

ebuildy · January 19, 2021, 9:12am

I am benchmarking elasticsearch against vespa, I am a big advocate of ES and we [quote="DavidTurner, post:4, topic:261467"]
No, because in a real situation in production you have snapshots from which to recover. Simultaneously losing two of your three masters should be extraordinarily rare.
[/quote]

yes, we have original data in HDFS (we are running a search engine, almost 1Po), but it takes some days to re-index everything, and data changes quite frequently (snapshoting is quite impossible for us).

We have several cluster, with hot/cold architecture (docs can move from cold to hot cluster), up time is really crucial for us, that why I am looking for "disaster recover" solutions rather than immutable ones (also, this is more "comfortable" for our boss to say we can recover!).

So the cluster state is distributed between master nodes and if we lose one master node we can lose data?

DavidTurner · January 19, 2021, 9:26am

No, it's always stored on a majority of the masters. If you have three master-eligible nodes then that means two of them. Thus if you lose a single master-eligible node then it's fine, one of the other two will also have the latest state.

system · February 16, 2021, 9:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Recover a broken 3 node elasticsearch cluster that has only 1 node left Elasticsearch	6	2431	September 12, 2020
Recover data after the lost of master [7.1.1] Elasticsearch	6	3763	July 25, 2019
Master not discovered or elected yet, an election requires a node with id [7pxH2sBjRcG2IZzFDfdKGg] Elasticsearch	8	2441	March 26, 2020
Force a Master Node to be the primary Elasticsearch	7	7171	July 5, 2017
Master election problem in 3 node cluster when one died Elasticsearch	7	2597	July 5, 2017

Force a node to be the master

Related topics