I'm looking into different ways to deploy Elasticsearch on Kubernetes.
My understanding so far is that it's normal to use the ECK operator pattern to deploy if you're deploying to Kubernetes, and that this causes StatefulSets to be used. Each pod for a data node in the StatefulSet would use a persistent volume claim (PVCs) to get a disk to use. When a data node's pod goes down, a new pod would be brought back up by Kubernetes that would re-attach the disk, due to the pod's PVC.
Meanwhile, according to my understanding of Elasticsearch rebalancing, while the pod is down, the cluster enters a yellow state and shards are rebalanced, meaning data for shards not running the desired level of replica shards is copied to other data nodes to get the running replica shard to the desired number.
So when the data node eventually comes back online, and the disk is automatically re-attached, won't Elasticsearch then be in a state where the re-attached disk has an extra copy of the data that was previously copied onto the other remaining running data nodes? If this is the case, I'm wondering how the Elasticsearch cluster would sort itself out:
- What does it do with this extra copy? Does it delete one of the copies or keep it around forever?
- If deleting a copy, which one does it choose to delete?
- If the cluster stayed in a yellow state the entire time the new data node was coming online because Kubernetes was able to bring a new data node online quicker than the rebalance could finish (this is likely in my opinion if I use a platform like GKE which can provision Kubernetes nodes quickly), then won't there be an incomplete/corrupt copy of the data on one of the data nodes? Can Elasticsearch deal with this safely?