Data node’s cluster uuid diffrent from master node's cluster uuid

111197 · August 26, 2019, 9:28am

Hi, all.

I’m trying to build elasticsearch version 7.3.0 cluster.(on docker on AWS ECS)
Now, master and ingest nodes are built successfully as below.
(Domain name and IPs are example name for explain.)

curl EXAMPLE-DOMAIN.com:9200/_cat/nodes                                                                             
LOCAL-IP(ingest-a)  3 57 0 0.05 0.01 0.00 i - LOCAL-IP(ingest-a):9200
LOCAL-IP(master-b) 8 93 0 0.00 0.00 0.00 m - master-b
LOCAL-IP(master-a)  8 95 0 0.00 0.00 0.00 m * master-a
LOCAL-IP(ingest-b)  2 29 0 0.00 0.05 0.02 i - LOCAL-IP(ingest-b):9200
LOCAL-IP(master-c)  8 95 0 0.00 0.00 0.00 m - master-c
LOCAL-IP(ingest-c) 2 30 0 0.01 0.01 0.00 i - LOCAL-IP(ingest-c):9200

I tried to add data node to this cluster, but data node never joins.
Data node's output logs:

Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: 
    join validation on cluster state with a different cluster uuid ORIGIN_CLUSTER_UUID than 
    local cluster uuid DATA_NODES_CLUSTER_UUID, rejecting

And ORIGIN_CLUSTER(data nodes and ingest nodes) output logs:

[o.e.c.c.Coordinator] [master-a] failed to validate incoming join request from node [{LOCAL-IP(data):9200}{lW1oIt9pTZGbxH4v466RJw}{fisHM6M9SX6runjHWvPCEg}{LOCAL-IP(data)}{LOCAL-IP(data):9300}{d}{aws_spot_zone=AWS_ZONE/INSTANCE , xpack.installed=true}]
org.elasticsearch.transport.RemoteTransportException: [LOCAL-IP(data):9200][172.17.0.2:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid ORIGIN_CLUSTER_UUID than local cluster uuid DATA_NODES_CLUSTER_UUID, rejecting
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:148) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:267) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.3.0.jar:7.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]

From other logs, it looks caused by DATA_NODES_CLUSTER_UUID is assigned before communicate to ORIGIN_CLUSTER.

Each node's config yml are below.

Upper case words are environment variables.
Each nodes use this yml.

cluster.name: "${ES_CLUSTER}"

cluster.routing.allocation.awareness.attributes: aws_spot_zone
cluster.routing.allocation.cluster_concurrent_rebalance: 240
cluster.routing.allocation.node_concurrent_recoveries: 4
cluster.remote.connect: true
cluster.initial_master_nodes:
  - master-a
  - master-b
  - master-c
# ------------------------------------ Node ------------------------------------
node.name: "${AWS_LOCAL_IPV4}:${HTTP_PORT}"
node.master: ${IS_MASTER_NODE}  # true if node is master.
node.data: ${IS_DATA_NODE}      # true if node is data.
node.ingest: ${IS_INGEST_NODE}  # true if node is ingest.

node.attr.aws_spot_zone: "${AWS_AVAILABILITY_ZONE}/${AWS_INSTANCE_TYPE}"

# ----------------------------------- Memory -----------------------------------

bootstrap.memory_lock: true

# ---------------------------------- Network -----------------------------------

network.host: 0.0.0.0

http.compression: true
http.port: 9200
http.publish_port: 9200
http.publish_host: ${AWS_LOCAL_IPV4}

transport.publish_port: 9300
transport.publish_host: ${AWS_LOCAL_IPV4}

http.cors.enabled: true
http.cors.allow-origin: "*"

# ---------------------------------- X-Pack ------------------------------------

xpack.security.enabled: false
xpack.monitoring:
  enabled: false
xpack.ml.enabled: false

# --------------------------------- Discovery ----------------------------------

discovery.zen.ping_timeout: 30s
discovery.ec2.endpoint: ec2.${EC2_REGION}.amazonaws.com

discovery.zen.hosts_provider: ec2
discovery.ec2.tag.es_cluster: "${ES_CLUSTER}"

discovery.seed_providers: ec2
discovery.seed_hosts:
  - master-a
  - master-b
  - master-c

# ----------------------------------- Cloud ------------------------------------

cloud.node.auto_attributes: false 

# ---------------------------------- Indices -----------------------------------

indices.breaker.total.limit: 70%

indices.breaker.fielddata.limit: 60%
indices.breaker.fielddata.overhead: 1.03

indices.breaker.request.limit: 60% 
indices.breaker.request.overhead: 1

network.breaker.inflight_requests.limit: 70%
network.breaker.inflight_requests.overhead: 1


indices.memory.index_buffer_size: 512mb

indices.recovery.max_bytes_per_sec: 500mb

# ---------------------------------- Various -----------------------------------

monitor.jvm.gc.overhead.warn: 100
monitor.jvm.gc.overhead.info: 80
monitor.jvm.gc.overhead.debug: 60

Is any config affect to assign cluster's uuid?
Thank you in advance.

DavidTurner · August 26, 2019, 9:53am

The cluster UUID is assigned by the elected master node when the cluster first forms, and is stored on disk on each node. A common reason for getting this message is that your master nodes are not using persistent storage, so they lose all their data when restarted and form a new cluster that the data nodes cannot join.

111197 · August 26, 2019, 10:40am

Thank you for quick response

This happens when creating cluster.
Still there is no data in master, ingest and data nodes.(no indices too)
It looks data node never joins even has no data.
Does it relate to persistence of disks?

FYI, each nodes responses are below.

ingest and master

# curl NODE-NAME.com:9200
{
  "name" : "NODE-NAME:9200",
  "cluster_name" : "ES_CLUSTER",
  "cluster_uuid" : "vcPboLtxQXyPhJMe8bn44A", # same cluster uuid
  "version" : {
    "number" : "7.3.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "de777fa",
    "build_date" : "2019-07-24T18:30:11.767338Z",
    "build_snapshot" : false,
    "lucene_version" : "8.1.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

data

# curl NODE-NAME.com:9200
{
  "name" : "NODE-NAME:9200",
  "cluster_name" : "ES_CLUSTER",
  "cluster_uuid" : "_na_" # no cluster uuid
  "version" : {
    "number" : "7.3.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "de777fa",
    "build_date" : "2019-07-24T18:30:11.767338Z",
    "build_snapshot" : false,
    "lucene_version" : "8.1.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

DavidTurner · August 26, 2019, 11:07am

I do not think this will result in the log message you quoted above:

111197:

Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: 
    join validation on cluster state with a different cluster uuid ORIGIN_CLUSTER_UUID than 
    local cluster uuid DATA_NODES_CLUSTER_UUID, rejecting

Can you share more logs? This one line isn't very helpful.

111197 · August 27, 2019, 3:13am

Here is logs when data node initialized.

gist.github.com

https://gist.github.com/y-kanno-im/ec29ba23e142085254bfb49bddb1bb71

elasticsearch data node logs

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
 [o.e.c.l.LogConfigurator] [DATANODE:9200] Some logging configurations have %marker but don't have %node_name. We will automatically add %node_name to the pattern to ease the migration for users who customize log4j2.properties but will stop this behavior in 7.0. You should manually replace `%node_name` with `[%node_name]%marker ` in these locations:
   /usr/share/elasticsearch/config/log4j2.properties
 -
 lo
        inet 127.0.0.1 netmask:255.0.0.0 scope:host
        UP LOOPBACK mtu:65536 index:1
 -
 eth0
        inet 172.17.0.2 netmask:255.255.0.0 broadcast:172.17.255.255 scope:site

This file has been truncated. show original

It looks, data node's cluster uuid BSRgW1mNS4ye1GfFHqVsjQ is assigned before TcpTransport layer errors(before communicate to master and ingest nodes).
Why different cluster uuid is assigned to data node?

DavidTurner · August 27, 2019, 8:53am

It's reading this UUID from its data path. See this note in the docs.

111197 · August 28, 2019, 3:35am

Here is content of data path.

master, ingest

/usr/share/elasticsearch/data
/usr/share/elasticsearch/data/nodes
/usr/share/elasticsearch/data/nodes/0
/usr/share/elasticsearch/data/nodes/0/_state
/usr/share/elasticsearch/data/nodes/0/_state/node-0.st
/usr/share/elasticsearch/data/nodes/0/node.lock

data

/usr/share/elasticsearch/data

In data node, cluster uuid is assigned even data path has no files.
I want to make this data node has files as same as master and ingest nodes.
Can you tell me which configs should be fixed?

If there is not enough information, please let me know.

DavidTurner · August 28, 2019, 6:53am

As per the link I sent above you need to start this cluster again from scratch by deleting all the contents of the data paths.

system · September 25, 2019, 6:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Docker master failed to Join the cluster Elasticsearch	2	624	September 26, 2018
Data node using different cluster id Elasticsearch	6	1676	September 12, 2020
Master and data nodes have different cluster UUIDs Elasticsearch	5	294	July 19, 2022
Node joining docker Elasticsearch docker	4	379	January 9, 2021
Not able to start data node for ES cluster Elasticsearch	1	443	November 14, 2019

Data node’s cluster uuid diffrent from master node's cluster uuid

Related topics