Nodes fail to join cluster after full cluster restart (cluster uuid mismatch?)

I was forced to stop all nodes in our cluster and now I can't bring the cluster back up. Looks like the there is a cluster uuid mismatch, but I don't know why this has happened or how to fix it.

"type": "server", "timestamp": "2019-11-18T10:46:02,609Z", "level": "WARN", "component": "o.e.c.c.Coordinator", "cluster.name": "docker-cluster", "node.name": "node-002", "message": "failed to validate incoming join request from node [{node-012}{GrpvmVyVSOm2UpZQIUa3pg}{guMNX7HRT0q8-Lx7HL0Cnw}{10.33.9.82}{10.33.9.82:9300}{dil}{ml.machine_memory=67388260352, ml.max_open_jobs=20, xpack.installed=true}]", "cluster.uuid": "fQo4028sSN-QWcCaG2w_ZA", "node.id": "v9st6CCkQyioc6YFMbj3Mg" , 
"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [node-012][172.19.0.2:9300][internal:cluster/coordination/join/validate]",
"Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid fQo4028sSN-QWcCaG2w_ZA than local cluster uuid i44zLmaER4ipQYj-F9QVDw, rejecting",

The data folder on each node is unchanged. What can I do to bring up the cluster again?

The cluster UUID is stored on disk on all master-eligible nodes and on all data nodes, and must match to prevent nodes from joining a different cluster since this is a good way to lose data. The usual way to get to this exception is to be using ephemeral storage on the master-eligible nodes. If the cluster UUID is missing on the master-eligible nodes then they will invent a new one, but this indicates that they have lost the cluster metadata too which means the data on your data nodes cannot be read correctly. If so, the safest way to proceed is to fix the storage on the master nodes to persist across restarts and then restore your data from a recent snapshot.

We’re using persistent storage on all nodes including the master-eligible nodes.

Is a full restore my only option? This is our production cluster and a full restore will take too long time.

Maybe also worth noting: I’ve done a successful full cluster restart previously.

What exact version are you using?

Can you grep all your logs on all nodes for INFO messages containing the string cluster UUID going back as far as possible, and share those logs here (or on https://gist.github.com if they don't fit here).

OK, so your initial guess was right: The data folder was missing.

During the time that the cluster was offline, a cron job (that I was unaware of, running docker system prune --volumes --force) removed the Docker volume containing /usr/share/elasticsearch/data on four of our nodes, including the eligible master nodes.

We're looking into the possibility of restoring /var/lib/docker (and hopefully the volumes) from backup. Would this be a bad idea? Will this leave our cluster in an inconsistent state? Four of the nodes would be using old data folders.

Yep that'd do it.

It is risky to try and restore from a filesystem backup and I can't recommend it in good conscience, since it will take some of your nodes "back in time" and the effects of this are undefined. It may result in lost data (possibly silently) or may render some of your indices unreadable.

We have nightly snapshots of our data. Is there a guide on how to restore a full cluster (including security data) from scratch from a snapshot?

I don't know of anything more specific than the restore docs. You may need to disable some components (e.g. Kibana, monitoring, watcher, rollups, ...) for the duration of the restore since they may otherwise create indices that block the restore, and you'll need to use a security realm other than native since the native realm uses the .security index that you'll be restoring.

Thanks!

In which path is this UUID and the cluster metadata stored? In the data-directory of the node?
If not, then this means, that a full cluster restart would always fail in a K8s environment cause the filesystem of pods is always ephemeral (if not mounted by volumes)...

Yes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.