I want to fix a broken cluster (when a node cannot join because differente cluster UUID by exemple) without removing all data folder (the dirty solution I can see every time....), currently I am testing it, using official elasticsearch Helm chart:
I created a 3 master nodes cluster
I delete the Helm release (pods are removed but volumes stay), and change the cluster name to break the cluster
When re-creating the Helm release, cluster is broken as expected
I remove all running pods by scaling to 0
I run yes | elasticsearch-node detach-cluster; yes | elasticsearch-node remove-customs * on all volume
I re-up all pods by scaling to 3
Clustering should working but not:
{"type": "server", "timestamp": "2021-01-04T09:41:30,190Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "test2", "node.name": "es-test-master-2", "message": "master not discovered yet and this node was detached from its previous cluster, have discovered [{es-test-master-2}{-FGfHUJgRwGEYkXgjFwiGQ}{0QWnT4lIToKQY4_jx6rV-w}{10.233.116.171}{10.233.116.171:9300}{m}{xpack.installed=true, transform.node=false}, {es-test-master-0}{yD4GKy3JSUmV1NW2mcLAtw}{v5eS-gMiRi-OaC88GYcoRw}{10.233.82.166}{10.233.82.166:9300}{m}{xpack.installed=true, transform.node=false}, {es-test-master-1}{7JZ9qyhATPq5ZCejHEkx3g}{UjJLnb6gSR22bY6QZWZfLA}{10.233.110.17}{10.233.110.17:9300}{m}{xpack.installed=true, transform.node=false}]; discovery will continue using [10.233.110.17:9300, 10.233.82.166:9300] from hosts providers and [{es-test-master-2}{-FGfHUJgRwGEYkXgjFwiGQ}{0QWnT4lIToKQY4_jx6rV-w}{10.233.116.171}{10.233.116.171:9300}{m}{xpack.installed=true, transform.node=false}] from last-known cluster state; node term 0, last-accepted version 32 in term 0" }
{"type": "server", "timestamp": "2021-01-04T09:41:35,486Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "test2", "node.name": "es-test-master-2", "message": "path: /_cluster/health, params: {wait_for_status=green, timeout=1s}",
"stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
"at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:230) [elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:335) [elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:601) [elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678) [elasticsearch-7.10.1.jar:7.10.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
"at java.lang.Thread.run(Thread.java:832) [?:?]"] }
No, that's not the case at all. Changing the cluster name risks no data loss, nor does it need you to run elasticsearch-node. (That's a tautology: elasticsearch-node always risks data loss)
Changing the cluster name also doesn't break the cluster. You mention "node cannot join because different cluster UUID", that's nothing to do with the cluster name.
I don't think this means what you think it means. You really should not be using elasticsearch-node detach-cluster. I quote its output here:
You should only run this tool if you have permanently lost all of the
master-eligible nodes in this cluster and you cannot restore the cluster
from a snapshot, or you have already unsafely bootstrapped a new cluster
by running `elasticsearch-node unsafe-bootstrap` on a master-eligible
node that belonged to the same cluster as this node. This tool can cause
arbitrary data loss and its use should be your last resort.
Repeating: This tool can cause arbitrary data loss. If you are concerned about possible data loss then you should not be considering using this tool.
As my understanding, the cluster state is stored as file, (as mysql/mongo etc... do), and what I learn is always possible to change file
What about if I use EMPTY_CONFIG = new VotingConfiguration(Collections.emptySet()); instead of MUST_JOIN_ELECTED_MASTER , cluster state should be really reseted ?
That would either result in a broken cluster or else directly lead to data loss too, I'm not sure which.
Perhaps you should take a step back and describe what you're actually trying to do here. Renaming a cluster doesn't break it, but a cluster reporting UUID mismatches hasn't just been renamed and has likely already lost data. Protecting against data loss is the reason for checking that the cluster UUID matches, there is simply no way to bypass that check without risking data loss.
I don't know how to emphasise any more strongly that what you are doing is dangerous and will eventually result in data loss. The process you are describing doesn't safely recover from a split brain, in fact it bypasses the safety checks that are there to prevent data loss caused by split-brain. If you are finding that you need to do this then there is something very very wrong with how you are managing your Elasticsearch cluster.
Ultimately it's your data so it's your call, but I cannot overstate to future readers of this thread how important it is not to follow the same path unless they also have no regard for the integrity of their data.
I am pretty sure everyone will try the fix before removing data
I understand elasticsearch is not a data-base, but only a super indexer/searcher service and its data must be always backed up somewhere. But in some cases, we are using it as primary data source (for log analysis) so we want something really reliable.
A correctly-orchestrated cluster needs neither of these solutions. If you are getting into situations where you end up needing to do anything like this then you're doing something wrong.
Absolutely, by instance yesterday, one guy have removed accidentaly the Helm release "master" with PVCs, then re-create it, hence different "cluster UUID".
Errors can happen, from humans or machines, that why software provide recovery disaster tools or documentation (like the awesome elasticsearch-node bin).
Dangerous tools like elasticsearch-node and the dangling indices API exist for cases where things go so badly wrong that data loss is inevitable. They are not awesome at all, they are a last resort for when things are desperately broken. They don't really "fix" anything and if you use them as a matter of course then you will eventually lose data as a result.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.