Hi, I need a little bit of help here.
I have a two master-eligible nodes cluster (yeah, now I know) and did the following:
- stopped elasticsearch on both nodes
- moved a copy from an unimportant index to a backup
- deleted the index directory from both nodes
- started elasticsearch
At this point i got an error that the nodes cannot start because they know of an index that doesn't exist anymore. I've then recovered the index on one machine and was able to start it, but now the cluster failed because you need both machines (recently migrated to 7.1).
So i copied the index to the other machine thinking it will start with the copy i from the other node. It did not. And since i deleted the folder on that node, i can't start it.
So this is the pickle: can't start one node because it doesn't want to start without the index, and the cluster doesn't start because the second node won't start. It's a circle.
Next, i have started the second machine with an empty data dir thinking it would sync when the cluster is live. But somehow both nodes wait for a node id that doesn't exist, although they see each other:
[2019-07-29T10:17:47,020][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ES1] master not discovered or elected yet, an election requires two nodes with ids [mgFa73WPSVSg3edz5Zmfqg, oloM9SZBRtSRBXjMV4uLFA], have discovered [{ES2}{oloM9SZBRtSRBXjMV4uLFA}{2mEpzKqFQB2PFvVwE6K8SA}{192.168.3.41}{192.168.3.41:9300}{ml.machine_memory=135144009728, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [192.168.3.41:9300] from hosts providers and [{ES1}{mgFa73WPSVSg3edz5Zmfqg}{r61aRxtBT3id96EkDDi-3A}{192.168.3.40}{192.168.3.40:9300}{ml.machine_memory=135144009728, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2019-07-29T10:18:12,197][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ES2] master not discovered or elected yet, an election requires a node with id [DwJ0DZvNTAatJvpFK_Otqw], have discovered [{ES1}{mgFa73WPSVSg3edz5Zmfqg}{r61aRxtBT3id96EkDDi-3A}{192.168.3.40}{192.168.3.40:9300}{ml.machine_memory=135144009728, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [192.168.3.40:9300] from hosts providers and [{ES2}{oloM9SZBRtSRBXjMV4uLFA}{2mEpzKqFQB2PFvVwE6K8SA}{192.168.3.41}{192.168.3.41:9300}{ml.machine_memory=135144009728, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 7, last-accepted version 648 in term 7
Any idea how to solve this? There has to be a way to recover from a missing index if you have a copy of it.