Master not discovered error

esrcon · September 4, 2019, 6:40pm

Hello,

I created a cluster contains 2 data/master nodes and 3 master-only nodes on AWS EC2.
Due to the bootstrap feature, I specified two data nodes ip for luster.initial_master_nodes.

After cluster started successfuly, I shut them down.
When I restart the services, I got below error message, saying master not discovered.

What I don't understand is in the error log, it says must discover two nodes with ip 54 and 249, and it also says 'have discovered' 54 and 249. What does it mean? Thanks in advance.

[2019-09-04T18:13:06,436][WARN ][o.e.c.c.ClusterFormationFailureHelper] [265c11851f2e] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [10.158.114.54, 10.158.114.249] to bootstrap a cluster: have discovered [{a85f4d457006}{07GysAOyT2m_VYcuKX1f-w}{Fch5OMfzQ6-MGl8QC5Bjhg}{10.158.114.68}{10.158.114.68:9300}, {cbc7d32e43be}{KKNNl2sJReuqC_Nkegop0g}{ScySSIWOTI6biCdHAOkFHg}{10.158.114.235}{10.158.114.235:9300}, {532457f50a27}{OQlMkdEZQ9ecPKyORE1x5w}{-Gfo9t75QiSOdy9oxRGb4w}{10.158.114.249}{10.158.114.249:9300}{aws_availability_zone=us-east-1c}, {035580c7e862}{OdgujABbRcWS-PPN3wrtoA}{OMrQ_3HlReWmBizc1y-YKA}{10.158.114.54}{10.158.114.54:9300}{aws_availability_zone=us-east-1b}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 10.158.114.235:9300, 10.158.114.137:9300, 10.158.114.249:9300, 10.158.114.68:9300, 10.158.114.54:9300] from hosts providers and [{265c11851f2e}{5DfCY0qiQE2GnnYZaFohbg}{-5s4Ai0nSFuTGYPSfTx8RQ}{10.158.114.137}{10.158.114.137:9300}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

DavidTurner · September 4, 2019, 6:45pm

There is something wrong with your configuration if you get this message on a restart:

this node has not previously joined a bootstrapped (v7+) cluster

From your description it sounds like these nodes had previously joined a cluster. Are you sure that you are using persistent storage for your master nodes?

Does it work to use the node names instead of their IP addresses?

DavidTurner · September 4, 2019, 6:51pm

Also, are 10.158.114.54 and 10.158.114.249 master-eligible? Which version are you using, exactly?

esrcon · September 4, 2019, 7:29pm

Thanks for quick reply, David.

Yes, .54 and .249 are master-eligible.

The version is 7.1.1.(Actually I'm using opendistro 1.1.0 from AWS)

The other master-only nodes does not use persistent storage, the data(also master) nodes(.54, .249) does.
I tried starting data nodes first then master-only nodes, after a full shutdown, it works.

The error message I was asking is when I started master-only nodes first after full showdown. I was expecting those master-only nodes are treated as new nodes, and they will discover .54/.249 after they started and join the existing cluster. From log, they did discover .54 and .249, but didn't mention why can not join.

DavidTurner · September 4, 2019, 7:33pm

Ok, this doesn't work. All master eligible nodes need persistent storage.

esrcon · September 4, 2019, 7:53pm

OK.
Since we are using AWS EC2 instances, what if some master nodes terminated and we can not get the storage back? Or it will be OK as long as there are more than half of master eligible nodes with persistent data?

Another question, if the whole cluster were down, all master nodes storage lost, but the data-only nodes storage remain, is there a way to recover our data?
Thank you.

DavidTurner · September 4, 2019, 8:03pm

You must have persistent storage on all master-eligible nodes. Then the cluster will tolerate the loss of a minority of them.

Yes, you can restore a snapshot into a new cluster.

DavidTurner · September 5, 2019, 6:58am

I worked out why you were having the original problem by the way. The node whose logs you shared isn't one of the nodes listed in cluster.initial_master_nodes so it cannot trigger the initial election (they can trigger other elections after the initial one, but we're not there yet). However the nodes that are listed in cluster.initial_master_nodes were failing to perform an election for some other reasons, that would have been described in their logs.

It is strange to have two data-and-master nodes and three master-only nodes. It is unusual to want five master-eligible nodes in your cluster. I think you should either have three dedicated master nodes and two data-only nodes, or else two data-and-master nodes and one extra master-only node. If you were using 7.3 and weren't using OpenDistro then you could make the extra master-only node a voting-only master node to ensure that it never actually becomes the master, meaning it would need less CPU and heap.

esrcon · September 5, 2019, 7:02pm

Thanks for the explanation, that's helpful.

system · October 3, 2019, 7:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster version 7.13.2 Elasticsearch	1	320	July 15, 2021
Master not discovered in the bootstrap cluster Elasticsearch	3	388	September 13, 2021
Master not discovered yet, this node has not previously joined a bootstrapped Elasticsearch	7	6870	July 26, 2019
Getting the Warning message as master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes Elasticsearch docker	11	1886	November 22, 2019
Master not Discovered yet Elasticsearch	4	2225	February 25, 2022

Master not discovered error

Related topics