Recoverability from node replacement on a 2-node cluster

We are using eck-operator and running Elasticsearch cluster on Kubernetes. We have a 3 master-eligible nodes and several worker nodes in the cluster.

In the case for running the cluster on 2 physical nodes, if a node goes down, we understand that Elasticsearch cluster is not resilient to failures. Resilience in small clusters | Elastic Docs

Our question is when we replace with a fresh new node without the previous data, is there a technical way that we can still recover the cluster with the index data kept? Now we are hitting a split-brain issue in the master node leader election because we replaced the physical node which hosted 2 master-eligible node which the majority is lost, so we have to re-establish the cluster to recover.

There is no automatic way to recover the data if you permanently lose a majority of master eligible master nodes. Sometimes I believe it may be possible to reconfigure the cluster, but that is not guaranteed to succeed and requires the manual use of the elasticsearch node utility, which is hard (maybe even impossible) to use with k8s, and can result in data loss.

In your scenario you will need to set up a new cluster and restore data from a recent snapshot taken using the official snapshot API to recover.

Er, just don’t put your 3 master-eligible instances on only 2 physical nodes?

Sorry, as you know the risks here, that’s just … unwise.

The 3rd master-eligible instance’s node doesn’t need much in terms of resources.

1 Like