Data node’s cluster uuid diffrent from master node's cluster uuid

Hi, all.

I’m trying to build elasticsearch version 7.3.0 cluster.(on docker on AWS ECS)
Now, master and ingest nodes are built successfully as below.
(Domain name and IPs are example name for explain.)

curl EXAMPLE-DOMAIN.com:9200/_cat/nodes                                                                             
LOCAL-IP(ingest-a)  3 57 0 0.05 0.01 0.00 i - LOCAL-IP(ingest-a):9200
LOCAL-IP(master-b) 8 93 0 0.00 0.00 0.00 m - master-b
LOCAL-IP(master-a)  8 95 0 0.00 0.00 0.00 m * master-a
LOCAL-IP(ingest-b)  2 29 0 0.00 0.05 0.02 i - LOCAL-IP(ingest-b):9200
LOCAL-IP(master-c)  8 95 0 0.00 0.00 0.00 m - master-c
LOCAL-IP(ingest-c) 2 30 0 0.01 0.01 0.00 i - LOCAL-IP(ingest-c):9200

I tried to add data node to this cluster, but data node never joins.
Data node's output logs:

Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: 
    join validation on cluster state with a different cluster uuid ORIGIN_CLUSTER_UUID than 
    local cluster uuid DATA_NODES_CLUSTER_UUID, rejecting

And ORIGIN_CLUSTER(data nodes and ingest nodes) output logs:

[o.e.c.c.Coordinator] [master-a] failed to validate incoming join request from node [{LOCAL-IP(data):9200}{lW1oIt9pTZGbxH4v466RJw}{fisHM6M9SX6runjHWvPCEg}{LOCAL-IP(data)}{LOCAL-IP(data):9300}{d}{aws_spot_zone=AWS_ZONE/INSTANCE , xpack.installed=true}]
org.elasticsearch.transport.RemoteTransportException: [LOCAL-IP(data):9200][172.17.0.2:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid ORIGIN_CLUSTER_UUID than local cluster uuid DATA_NODES_CLUSTER_UUID, rejecting
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:148) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:267) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.3.0.jar:7.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]

From other logs, it looks caused by DATA_NODES_CLUSTER_UUID is assigned before communicate to ORIGIN_CLUSTER.

Each node's config yml are below.

Upper case words are environment variables.
Each nodes use this yml.

cluster.name: "${ES_CLUSTER}"

cluster.routing.allocation.awareness.attributes: aws_spot_zone
cluster.routing.allocation.cluster_concurrent_rebalance: 240
cluster.routing.allocation.node_concurrent_recoveries: 4
cluster.remote.connect: true
cluster.initial_master_nodes:
  - master-a
  - master-b
  - master-c
# ------------------------------------ Node ------------------------------------
node.name: "${AWS_LOCAL_IPV4}:${HTTP_PORT}"
node.master: ${IS_MASTER_NODE}  # true if node is master.
node.data: ${IS_DATA_NODE}      # true if node is data.
node.ingest: ${IS_INGEST_NODE}  # true if node is ingest.

node.attr.aws_spot_zone: "${AWS_AVAILABILITY_ZONE}/${AWS_INSTANCE_TYPE}"

# ----------------------------------- Memory -----------------------------------

bootstrap.memory_lock: true

# ---------------------------------- Network -----------------------------------

network.host: 0.0.0.0

http.compression: true
http.port: 9200
http.publish_port: 9200
http.publish_host: ${AWS_LOCAL_IPV4}

transport.publish_port: 9300
transport.publish_host: ${AWS_LOCAL_IPV4}

http.cors.enabled: true
http.cors.allow-origin: "*"

# ---------------------------------- X-Pack ------------------------------------

xpack.security.enabled: false
xpack.monitoring:
  enabled: false
xpack.ml.enabled: false

# --------------------------------- Discovery ----------------------------------

discovery.zen.ping_timeout: 30s
discovery.ec2.endpoint: ec2.${EC2_REGION}.amazonaws.com

discovery.zen.hosts_provider: ec2
discovery.ec2.tag.es_cluster: "${ES_CLUSTER}"

discovery.seed_providers: ec2
discovery.seed_hosts:
  - master-a
  - master-b
  - master-c

# ----------------------------------- Cloud ------------------------------------

cloud.node.auto_attributes: false 

# ---------------------------------- Indices -----------------------------------

indices.breaker.total.limit: 70%

indices.breaker.fielddata.limit: 60%
indices.breaker.fielddata.overhead: 1.03

indices.breaker.request.limit: 60% 
indices.breaker.request.overhead: 1

network.breaker.inflight_requests.limit: 70%
network.breaker.inflight_requests.overhead: 1


indices.memory.index_buffer_size: 512mb

indices.recovery.max_bytes_per_sec: 500mb

# ---------------------------------- Various -----------------------------------

monitor.jvm.gc.overhead.warn: 100
monitor.jvm.gc.overhead.info: 80
monitor.jvm.gc.overhead.debug: 60

Is any config affect to assign cluster's uuid?
Thank you in advance.

The cluster UUID is assigned by the elected master node when the cluster first forms, and is stored on disk on each node. A common reason for getting this message is that your master nodes are not using persistent storage, so they lose all their data when restarted and form a new cluster that the data nodes cannot join.

Thank you for quick response :slight_smile:

This happens when creating cluster.
Still there is no data in master, ingest and data nodes.(no indices too)
It looks data node never joins even has no data.
Does it relate to persistence of disks?

FYI, each nodes responses are below.

ingest and master

# curl NODE-NAME.com:9200
{
  "name" : "NODE-NAME:9200",
  "cluster_name" : "ES_CLUSTER",
  "cluster_uuid" : "vcPboLtxQXyPhJMe8bn44A", # same cluster uuid
  "version" : {
    "number" : "7.3.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "de777fa",
    "build_date" : "2019-07-24T18:30:11.767338Z",
    "build_snapshot" : false,
    "lucene_version" : "8.1.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

data

# curl NODE-NAME.com:9200
{
  "name" : "NODE-NAME:9200",
  "cluster_name" : "ES_CLUSTER",
  "cluster_uuid" : "_na_" # no cluster uuid
  "version" : {
    "number" : "7.3.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "de777fa",
    "build_date" : "2019-07-24T18:30:11.767338Z",
    "build_snapshot" : false,
    "lucene_version" : "8.1.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

I do not think this will result in the log message you quoted above:

Can you share more logs? This one line isn't very helpful.

Here is logs when data node initialized.

It looks, data node's cluster uuid BSRgW1mNS4ye1GfFHqVsjQ is assigned before TcpTransport layer errors(before communicate to master and ingest nodes).
Why different cluster uuid is assigned to data node?

It's reading this UUID from its data path. See this note in the docs.

Here is content of data path.


master, ingest

/usr/share/elasticsearch/data
/usr/share/elasticsearch/data/nodes
/usr/share/elasticsearch/data/nodes/0
/usr/share/elasticsearch/data/nodes/0/_state
/usr/share/elasticsearch/data/nodes/0/_state/node-0.st
/usr/share/elasticsearch/data/nodes/0/node.lock

data

/usr/share/elasticsearch/data

In data node, cluster uuid is assigned even data path has no files.
I want to make this data node has files as same as master and ingest nodes.
Can you tell me which configs should be fixed?

If there is not enough information, please let me know.

As per the link I sent above you need to start this cluster again from scratch by deleting all the contents of the data paths.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.