Set Master node ID for discovery

Long story short, I screwed up in the process of migrating to a new datastore and removed the original one from the config file before I was ready causing some issues. Luckily, this is a new cluster build and not yet production.

It appears that the hosts are looking for master IDs that are not accurate causing the election to fail. Host 1 is looking for the correct IDs that exist in the cluster however host 2 and 3 are now looking for incorrect IDs. I assume new IDs were generated during my failed migration and then reverted when I re-attached the original datastore. Is there a way to set the IDs that the hosts look for?

This is my last ditch effort before I run unsafe-bootstrap on host 1 and then migrate the other hosts to the "new" cluster.

Here are the logs from each host.

HOST 1 - DATA NODE
[2020-11-20T16:40:23,125][WARN ][o.e.c.c.ClusterFormationFailureHelper] [vpelasticsearch01] master not discovered or elected yet, an election requires at least 2 nodes with ids from [hV7KOY-hTQye3MuY-E2axQ, pK5DiXRERLGrRdS4_q0ZZQ, KIPbHxcSTbK5evRQFSMRbA], have discovered [

{vpelasticsearch01}{KIPbHxcSTbK5evRQFSMRbA}{YhFHFbX4SwyDH9LPPoHf8Q}{10.156.5.174}{10.156.5.174:9300}{dilmrt}{ml.machine_memory=16644874240, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}, 
{vpelasticsearch02}{pK5DiXRERLGrRdS4_q0ZZQ}{X3e4-6ZHQxSUVCrjmrRAtg}{10.156.5.175}{10.156.5.175:9300}{dilmrt}{ml.machine_memory=16644874240, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}, 
{vpelasticsearch03}{hV7KOY-hTQye3MuY-E2axQ}{igzhfquWTriaWmub6-jbJA}{10.156.5.177}{10.156.5.177:9300}{dilmrtv}{ml.machine_memory=3962003456, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}] which is a quorum; 

discovery will continue using [10.156.5.175:9300, 10.156.5.177:9300] from hosts providers and [{vpelasticsearch01}{KIPbHxcSTbK5evRQFSMRbA}{YhFHFbX4SwyDH9LPPoHf8Q}{10.156.5.174}{10.156.5.174:9300}{dilmrt}{ml.machine_memory=16644874240, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] from last-known cluster state; node term 175, last-accepted version 116049 in term 175

HOST 2 - DATA NODE
[2020-11-20T16:40:25,535][WARN ][o.e.c.c.ClusterFormationFailureHelper] [vpelasticsearch02] master not discovered or elected yet, an election requires at least 2 nodes with ids from [***A8E0pA1NQp24mby8Wexqiw***, hV7KOY-hTQye3MuY-E2axQ, pK5DiXRERLGrRdS4_q0ZZQ], have discovered [

{vpelasticsearch02}{pK5DiXRERLGrRdS4_q0ZZQ}{X3e4-6ZHQxSUVCrjmrRAtg}{10.156.5.175}{10.156.5.175:9300}{dilmrt}{ml.machine_memory=16644874240, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}, 
{vpelasticsearch01}{KIPbHxcSTbK5evRQFSMRbA}{YhFHFbX4SwyDH9LPPoHf8Q}{10.156.5.174}{10.156.5.174:9300}{dilmrt}{ml.machine_memory=16644874240, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}, 
{vpelasticsearch03}{hV7KOY-hTQye3MuY-E2axQ}{igzhfquWTriaWmub6-jbJA}{10.156.5.177}{10.156.5.177:9300}{dilmrtv}{ml.machine_memory=3962003456, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}] which is a quorum; 

discovery will continue using [10.156.5.174:9300, 10.156.5.177:9300] from hosts providers and [{vpelasticsearch02}{pK5DiXRERLGrRdS4_q0ZZQ}{X3e4-6ZHQxSUVCrjmrRAtg}{10.156.5.175}{10.156.5.175:9300}{dilmrt}{ml.machine_memory=16644874240, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] from last-known cluster state; node term 177, last-accepted version 116081 in term 177

HOST3 - VOTING ONLY
[2020-11-20T16:40:30,264][WARN ][o.e.c.c.ClusterFormationFailureHelper] [vpelasticsearch03] master not discovered or elected yet, an election requires at least 2 nodes with ids from [***A8E0pA1NQp24mby8Wexqiw***, hV7KOY-hTQye3MuY-E2axQ, ***dYmdCf-eRKaU8U2XJ80DiQ***], have discovered [

{vpelasticsearch03}{hV7KOY-hTQye3MuY-E2axQ}{igzhfquWTriaWmub6-jbJA}{10.156.5.177}{10.156.5.177:9300}{dilmrtv}{ml.machine_memory=3962003456, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}, 
{vpelasticsearch01}{KIPbHxcSTbK5evRQFSMRbA}{YhFHFbX4SwyDH9LPPoHf8Q}{10.156.5.174}{10.156.5.174:9300}{dilmrt}{ml.machine_memory=16644874240, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}, 
{vpelasticsearch02}{pK5DiXRERLGrRdS4_q0ZZQ}{X3e4-6ZHQxSUVCrjmrRAtg}{10.156.5.175}{10.156.5.175:9300}{dilmrt}{ml.machine_memory=16644874240, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}] which is not a quorum; 

discovery will continue using [10.156.5.174:9300, 10.156.5.175:9300] from hosts providers and [{vpelasticsearch03}{hV7KOY-hTQye3MuY-E2axQ}{igzhfquWTriaWmub6-jbJA}{10.156.5.177}{10.156.5.177:9300}{dilmrtv}{ml.machine_memory=3962003456, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] from last-known cluster state; node term 179, last-accepted version 116210 in term 179

Edit: Asterisks placed around IDs that do not match the cluster.

No, because that's not the problem here. It's this:

last-accepted version 116210 in term 179

The latest cluster state has version ≥116210 and is held only on a majority of the voting masters, i.e. the nodes with IDs A8E0pA1NQp24mby8Wexqiw and dYmdCf-eRKaU8U2XJ80DiQ. If these nodes are gone you have lost that cluster state, and we can't even tell you what data has been lost in the process.

I'd recommend starting again from a recent snapshot so you at least know what you're missing.

The three hosts listed are the only three hosts that ever existed. The IDs are messed up causing it to look like hosts are missing and that has caused the version to become incorrect as well. I might as well just pick one of the hosts and unsafe-bootstrap that one i guess. Each database should still have all the correct information as far as templates and ILMs etc. so hopefully that works unless there is another option I am unaware of. I do not have a snapshot to revert to yet as this is still a pre-production cluster.

This isn't true -- it's not possible to mess up the node IDs and the versions without also potentially messing up the rest of the cluster state. You must always have access to the most recent cluster state since it contains things like the vital in sync markers on shards.

Since this is pre-prod I'd suggest starting again from the source data rather than take the risk of arbitrary data loss.

To which source data are you referring. I have attached the original data locations back to all the nodes however they are still out of sync. As I said, I don't care if I loose the shards. I was just trying to avoid re-building the cluster from scratch. My thought was I could bootstrap one of the nodes up and hope logstash and kibana still connected correctly and build from there.

I mean the original source, wherever you got it from to put it into Elasticsearch in the first place.

IMO rebuilding the cluster from scratch is the right thing to do. The latest cluster state is lost so it's dangerous to try anything else.

Well I was hoping to avoid the rebuild as it has taken me a while to get to this point. Lesson learned. Thanks for the help.

Well, since I have nothing to lose, I decided to try the unsafe-bootstrap option. At the very least I thought it would give me some experience in a worst case scenario should this happen to me in the future. Here is my experience which, hopefully, may help someone else along the way.

Because I knew what I did to screw it up, I decided to run it on the master node with the oldest version (contrary to the recommendation in the documentation) as that would (hopefully) contain the information before my screw up ruined it. Again, in this instance the only thing I did to screw it up was disconnect the data location causing elasticsearch to start up pointing to a new, empty location. I fixed the yaml file and re-attched the original data location hoping the cluster information contained in there would still be readable. I then ran the "elasticsearch-node unsafe-bootstrap" command and the elasticsearch node started successfully as expected. Then I attempted to see if Kibana would connect to it. This was my hope, meaning the authentication between kibana and elastic didn't get messed up and I could query the elasticsearch data through the dev tools. To my suprise, it actually worked! I then detached the other 2 nodes from the cluster, edited the elasticsearch.yml file to point to the correct master node and restarted them. By some miracle the cluster re-formed and was in the yellow state. After re-enabling shard allocation, I now have my cluster back up and running and everything is green. I still may rebuild in case something else may bite me in the future, however, this way I can export all my customization that has been done up to this point and save my self a lot of work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.