Elasticsearch operates in a clustered mode, not master-slave. If you want a highly available cluster you therefore need a minimum of three master-eligible nodes.
old cluster A is master, I set node.master: true in elasticsearch.yml , B is set node.master: false in elasticsearch.yml
Stop A, update B config( remove node.master: false in elasticsearch.yml ), so B can run standalone
POST new data to B node
Startup A node and (set node.master: false in elasticsearch.yml ), B is set node.master: true in elasticsearch.yml , so B is master in new cluster
But the new data loss!
This does not look like a split brain, at a time there was only one master. First A is master, then it is stopped and node B is made master (single node cluster), then A is made master ineligible and added back to cluster. But it seems that node A's shards override node B's shards?
Could this be an issue around allotting primary terms for shards when master B promoted its replica to primary?
How are the cluster state details from node A and node B reconciled when A joins back?
Stop A, update B config( remove node.master: false in elasticsearch.yml ), so B can run standalone
Since B was not a master-eligible node it doesn't have a full copy of the cluster metadata on disk, so it starts up empty. It will have some index metadata, but maybe not all of it, and what it has could also be stale. It imports any indices it finds as dangling indices, and blindly trusts the corresponding index metadata even though this could be stale (and that includes primary terms). Re-using a primary term like this breaks all sorts of assumptions on which we rely, so from that point on the behaviour of Elasticsearch is undefined.
If I delete node.master setting and set minimum_master_nodes 1 in both A and B. Then I try the same steps, it will also loss data.
But if I set minimum_master_nodes 2 when A and B both alive, and set minimum_master_nodes 1 when only one node alive, it will works.
I would recommend trying to add a small dedicated master node somewhere. This does not require a lot of resources and would give you three master-eligible nodes even if it does not hold data.
If you want to handle that automatically without manual intervention or risk of data loss that requires a minimum of 3 master eligible nodes. There is no way around this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.