I want to fix a broken cluster (when a node cannot join because differente cluster UUID by exemple) without removing all data folder (the dirty solution I can see every time....), currently I am testing it, using official elasticsearch Helm chart:
- I created a 3 master nodes cluster
- I delete the Helm release (pods are removed but volumes stay), and change the cluster name to break the cluster
- When re-creating the Helm release, cluster is broken as expected
- I remove all running pods by scaling to 0
- I run
yes | elasticsearch-node detach-cluster; yes | elasticsearch-node remove-customs *
on all volume - I re-up all pods by scaling to 3
Clustering should working but not:
{"type": "server", "timestamp": "2021-01-04T09:41:30,190Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "test2", "node.name": "es-test-master-2", "message": "master not discovered yet and this node was detached from its previous cluster, have discovered [{es-test-master-2}{-FGfHUJgRwGEYkXgjFwiGQ}{0QWnT4lIToKQY4_jx6rV-w}{10.233.116.171}{10.233.116.171:9300}{m}{xpack.installed=true, transform.node=false}, {es-test-master-0}{yD4GKy3JSUmV1NW2mcLAtw}{v5eS-gMiRi-OaC88GYcoRw}{10.233.82.166}{10.233.82.166:9300}{m}{xpack.installed=true, transform.node=false}, {es-test-master-1}{7JZ9qyhATPq5ZCejHEkx3g}{UjJLnb6gSR22bY6QZWZfLA}{10.233.110.17}{10.233.110.17:9300}{m}{xpack.installed=true, transform.node=false}]; discovery will continue using [10.233.110.17:9300, 10.233.82.166:9300] from hosts providers and [{es-test-master-2}{-FGfHUJgRwGEYkXgjFwiGQ}{0QWnT4lIToKQY4_jx6rV-w}{10.233.116.171}{10.233.116.171:9300}{m}{xpack.installed=true, transform.node=false}] from last-known cluster state; node term 0, last-accepted version 32 in term 0" }
{"type": "server", "timestamp": "2021-01-04T09:41:35,486Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "test2", "node.name": "es-test-master-2", "message": "path: /_cluster/health, params: {wait_for_status=green, timeout=1s}",
"stacktrace": ["org.elasticsearch.discovery.MasterNotDiscoveredException: null",
"at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:230) [elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:335) [elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:601) [elasticsearch-7.10.1.jar:7.10.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678) [elasticsearch-7.10.1.jar:7.10.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
"at java.lang.Thread.run(Thread.java:832) [?:?]"] }
How can I fix it ?