Just curious, how to fix a broken cluster like this:
Via official Helm release, I made a 3 nodes cluster, was working fine:
node-0 / node-1 / node-2
I scaled down the STS to 0, to remove all pods/nodes.
Then when I scale up, the first pod (node-0) cannot elect a master, because the other pods (node-1/node-2) are not up (which is the normal behavior, to prevent split brain ES need a quorum, I am ok):
master not discovered or elected yet, an election requires at least 2 nodes with ids from [jyUoUGcASqO8kQMrjNWSlQ, W7CrWAy-SuOtiWPy9C7eHw, ujsS2swvTzGQcEv__DoNcQ],
Without recreating the other node (lets say servers have burnt!), how can I "force" this alone node to become the master?
I tried voting_config_exclusions but this API requires ... an elected master. (master_not_discovered_exception).
If you only have one out of three nodes remaining then you have lost data: the cluster state is only stored on a majority of the master-eligible nodes, which might be the missing two. The solution is to form a new cluster and restore a recent snapshot.
I am benchmarking elasticsearch against vespa, I am a big advocate of ES and we [quote="DavidTurner, post:4, topic:261467"]
No, because in a real situation in production you have snapshots from which to recover. Simultaneously losing two of your three masters should be extraordinarily rare.
[/quote]
yes, we have original data in HDFS (we are running a search engine, almost 1Po), but it takes some days to re-index everything, and data changes quite frequently (snapshoting is quite impossible for us).
We have several cluster, with hot/cold architecture (docs can move from cold to hot cluster), up time is really crucial for us, that why I am looking for "disaster recover" solutions rather than immutable ones (also, this is more "comfortable" for our boss to say we can recover!).
So the cluster state is distributed between master nodes and if we lose one master node we can lose data?
No, it's always stored on a majority of the masters. If you have three master-eligible nodes then that means two of them. Thus if you lose a single master-eligible node then it's fine, one of the other two will also have the latest state.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.