I'm running Elastic 7.13.3 in a 3 Node Cluster Environment on Windows at a Customer where within 3-4 hours two SSD's failed to operate and from 3 Nodes only one Node is still up an running although i can not access this Node since it's saying that no Master Node can be elected - this is true indeed since 2 Nodes are off and i was thinking that since i've got 2 replicas of every shard this should be easy - even if 2 nodes are failing, one Node is still operable - so my question ist, how to get this Node running (i tried to change the .yml file, and even to start another node with no data but no success) - is there a way to start a single node and have a least this one up and running?
I was wandering, wouldn't it be better to have 3 single Clusters with 1 Node and distribute the Data between these Nodes so there wouldn't be such a uncomfortable situation ...?
Thanks Christian, i'm really worried about the data since i can start the node but it is not accessible since it's always saying no master node could be elected although i've reduced the cluster to node in the .yml file - there might be something in the data which could lead to this issue? i don't know but i might do something wrong!
If a majority of master eligible nodes have been lost there is no dafe way to get the cluster back up. The docs I linked to recommends setting up a new cluster and restore from a snapshot. If this is not possible you may need to use the utility I linked to perform an unsafe bootstrap, which could lead to data loss. I have never had to do this so will leave it to others with more practical experience in this area to comment/provide feedback.
Thank you very much Christian, i really appreciate your help and will provice follow up infos on how to deal with such a case which is very unlikely but unfortunatly real!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.