Stale master elected in elastic version < 7

johny1 · May 3, 2022, 9:28pm

running version 6.x which relies on minimum master node config for quorum.
the cluster has three master eligible nodes (1,2,3) and active master was (1). One master node (3) left the cluster due to an unknown error and rejoined when it was restarted manually after a few days. At this point of time, this master (3) had stale data and I presume it will gradually sync its internal state from active master (1). However, at this moment, the active master (1) left the cluster and (3) became the active master. This caused dangling handlers to shards created in past few days (while 3 was down) and lead to loss of data. My question is why did Elasticsearch elect (3) as active master while it had stale data and was in process of syncing from (1)? Does it provide any protection in this scenario and prefer (2) over (3) for master election?

warkolm · May 3, 2022, 10:51pm

Welcome to our community!
First things, 6.X is EOL and no longer supported, you should be looking to upgrade as a matter of urgency.

Can you post the output from the _cluster/stats?pretty&human API for us to look at? It'll help provide more context on your cluster.

johny1 · May 3, 2022, 11:03pm

As its an internal cluster, I cannot post the details or output of the call. Here are some stats that might be of help. please let me know if you need specific details:

60 data nodes
3 master eligible nodes
minimum master nodes = 2
400 indexes
2600 shards replication 1:1
16 billion documents
70,000 segments

In general, what criteria Elasticsearch uses to elect an active master from 2 or 3 eligible nodes. Does staleness of a master state factor into this?

DavidTurner · May 4, 2022, 3:31am

It does, but in V6 and earlier it's not watertight. You should upgrade to a version that isn't EOL as a matter of urgency.

johny1 · May 4, 2022, 12:23pm

we are planning actively to upgrade soon to latest version. Do you have a reference on this topic? It will help us in improving our capacity planning against failure modes.

DavidTurner · May 4, 2022, 1:06pm

What do you mean by a reference? The EOL docs already linked in this thread show that you're using an unacceptably old version. You're missing out on literally years of bugfixes and performance improvements.

johny1 · May 4, 2022, 2:40pm

My interest is in reference to latest docs documenting/explaining this behavior (around master election) in more detail that will help also explain why this scenario would have been avoided in current versions.

DavidTurner · May 4, 2022, 2:52pm

Discovery and cluster formation | Elasticsearch Guide [8.2] | Elastic perhaps?

system · June 1, 2022, 2:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Master Quorum is lost Elasticsearch	4	383	March 14, 2023
2 node cluster on Elasticsearch 7.0.1 Elasticsearch	6	1568	August 5, 2019
Elasticsearch master node replacment Elasticsearch	10	527	August 21, 2019
Accidentally loaded old cluster state Elasticsearch	4	620	July 5, 2017
Elasticsearch manual election Elasticsearch	3	439	February 7, 2019

Stale master elected in elastic version < 7

Related topics