Elasticsearch stuck with cluster_uuid NA

mouli_v · May 1, 2021, 1:49pm

We have a production elastic cluster with a 1.0.1 operator version. We planned to upgrade the version to the latest 1.5 so that we can opt for the dynamic storage changes.

When we ran upgrade of the operator; pods in the cluster started recreating. The first pod recreated with new crd's changes and created a cluster with "cluster_uuid" : "na".

Pods in the cluster are 5 [es-0 to es-4]
Master eligible pods are 3.

Currently, pod es-4 recreated without UUID, so that it is not able to join the existing cluster.

Any suggestions here.

Logs:

[WARN ][o.e.c.c.ClusterFormationFailureHelper] [es-4] master not discovered or elected yet, an election requires at least 3 nodes with ids from [h_EWLH6LQ_uyGXccS180EA, lnygcbVsTcW-iP9tYokRyg, D-EpZDHvRmWYPkYoCmKZOg, Uzm73zuMS1yaRLzPwEm_Bw, 8rVHfnlqTw-o-c_EZwExWA], have discovered [{es-4}{h_EWLH6LQ_uyGXccS180EA}{cn_NslDWSqaNLUD-LZYEnA}{1X.X.X.X37.18}{1X.X.X.X37.18:9300}{cdhilmrstw}{k8s_node_name=gke-node-4868d10d-h4ws, ml.machine_memory=X.X.X.X6736, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [X.X.X.X.1:9300, X.X.X.X.1:9301, X.X.X.X.1:9302, X.X.X.X.1:9303, X.X.X.X.1:9304, X.X.X.X.1:9305, 1X.X.X.X3X.X.X.X0, 1X.X.X.X41.18:9300, 1X.X.X.X43.20:9300, 1X.X.X.X58.22:9300] from hosts providers and [{es-4}{h_EWLH6LQ_uyGXccS180EA}{cn_NslDWSqaNLUD-LZYEnA}{1X.X.X.X37.18}{1X.X.X.X37.18:9300}{cdhilmrstw}{k8s_node_name=gke-node-4868d10d-h4ws, ml.machine_memory=X.X.X.X6736, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] from last-known cluster state; node term 133, last-accepted version 53845 in term 133

Cluster metadata:

# curl -X GET "localhost:9200"
{
  "name" : "es-4",
  "cluster_name" : "es",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "7.10.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "747e1cc715sHKXqdef077253878a59143c1f785afa92b9",
    "build_date" : "2021-01-13T00:42:12.435326Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

DavidTurner · May 1, 2021, 4:54pm

This node cannot discover any other nodes at the addresses provided. Either the addresses are wrong, or else there's a connectivity problem.

mouli_v · May 2, 2021, 11:49am

@DavidTurner
The IP addresses that es-4 is trying to connect were correct and no connectivity problems.

Those IPs belong to the nodes in the ES cluster.

FYI: es-{0-4} are statefulsets.

We have 5 pods es-{0-4} in the ES cluster. The elasticsearch operator is managing them with version 1.0.

The latest version is 1.5.
Ref Upgrade ECK | Elastic Cloud on Kubernetes [1.5] | Elastic

As part of the operator upgrade, we have changed the operator version and deployed it.

The operator started recreating pods. First, it kicked off es-4 and it recreated but it is not able to connect to the existing cluster. There are no network connectivity issues.

To test the changes, I tried the same in our test env. The operator recreated es-4 and I had to delete another 2 more nodes/pods[es-3, es-2] to recreate[to satisfy the quorum spec] and form a cluster.

The rest of the nodes/pods es-1 and es-0 taken care of by the operator, those were recreated and joined without any manual deletions.

Is that expected behavior? Do we need to manually kill/delete the pods to join the cluster back?

DavidTurner · May 2, 2021, 2:25pm

This seems like a contradiction?

mouli_v · May 2, 2021, 3:07pm

To be more specific: es-4 able to recreate but not able to join the cluster.

mouli_v · May 4, 2021, 1:19pm

Any suggestions here to fix.

I tried upgrading the elasticsearch operator version from 1.0 to 1.5.

DavidTurner · May 4, 2021, 2:46pm

As you said, the node is not able to connect to the existing cluster. That's a connectivity problem.

You could try setting logger.org.elasticsearch.discovery: DEBUG to expose the low-level exceptions that Elasticsearch is seeing, but they normally don't tell us much more than we already know here.

system · June 1, 2021, 2:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch cluster UUID is not available after version upgrade Elasticsearch	5	1698	February 12, 2022
What is mean "cluster_uuid" : "_na_" in ES v2.4.1? Elasticsearch	1	3757	July 5, 2017
Cluster_uuidが_na_となる日本語による質問・議論はこちら	2	1639	January 8, 2021
Unable to join ES node with different cluster_uuid Elasticsearch	3	823	August 31, 2018
Elasticsearch cluster UUID: _na_ Elasticsearch	3	11478	March 10, 2020

Elasticsearch stuck with cluster_uuid NA

Related topics