Failed add node

Dear All,

i cannot add node data to dedicated master-eligible node, we have one master-eligible and 2 dedicated data node

master configuration 192.168.11.12
node.name: ${HOSTNAME}
node.data: false
node.ingest: false
node.ml: false
cluster.remote.connect: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["localhost" ,"192.168.11.12"]
discovery.seed_hosts: ["192.168.11.12", "192.168.11.10", "192.168.11.11"]
cluster.initial_master_nodes: ["masternode"]

datanode 192.168.11.10
node.name: ${HOSTNAME}
node.master: false
node.ingest: false
node.data: true
cluster.remote.connect: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["localhost", "192.168.11.10"]
discovery.seed_hosts: ["masternode"]

this log from datanode
[2019-06-27T12:04:13,448][INFO ][o.e.c.c.JoinHelper ] [datanode1] failed to join {masternode}{mef6_j7CRwu4PaMOIvhzHQ}{85ljGwpvR5eeaU-HiW_FhQ}{192.168.11.12}{192.168.11.12:9300}{xpack.installed=true} with JoinRequest{sourceNode={datanode1}{mef6_j7CRwu4PaMOIvhzHQ}{jNnXP2V9Sg6i9R-4Ya7baA}{192.168.11.10}{192.168.11.10:9300}{ml.machine_memory=4072120320, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=13, lastAcceptedTerm=1, lastAcceptedVersion=23, sourceNode={datanode1}{mef6_j7CRwu4PaMOIvhzHQ}{jNnXP2V9Sg6i9R-4Ya7baA}{192.168.11.10}{192.168.11.10:9300}{ml.machine_memory=4072120320, xpack.installed=true, ml.max_open_jobs=20}, targetNode={masternode}{mef6_j7CRwu4PaMOIvhzHQ}{85ljGwpvR5eeaU-HiW_FhQ}{192.168.11.12}{192.168.11.12:9300}{xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [masternode][192.168.11.12:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalArgumentException: can't add node {datanode1}{mef6_j7CRwu4PaMOIvhzHQ}{jNnXP2V9Sg6i9R-4Ya7baA}{192.168.11.10}{192.168.11.10:9300}{ml.machine_memory=4072120320, ml.max_open_jobs=20, xpack.installed=true}, found existing node {masternode}{mef6_j7CRwu4PaMOIvhzHQ}{85ljGwpvR5eeaU-HiW_FhQ}{192.168.11.12}{192.168.11.12:9300}{xpack.installed=true} with the same id but is a different node instance
at org.elasticsearch.cluster.node.DiscoveryNodes$Builder.add(DiscoveryNodes.java:606) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.JoinTaskExecutor.execute(JoinTaskExecutor.java:142) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.coordination.JoinHelper$1.execute(JoinHelper.java:118) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-7.2.0.jar:7.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]

please need advice and help

Thanks

The key message is here:

Your two nodes have the same node ID, mef6_j7CRwu4PaMOIvhzHQ, which indicates you've copied the data path. You should start your new node up with an empty data path instead.

Hi DavidTurner,

how i do that..? deleting data file ...? please advice.

Thanks for reply

Set path.data to an empty directory. If you copied files into the data path by hand then you should undo that by deleting them.

can i just deleting the path.data or create new data path..?
if i delete /var/lib/Elasticsearch elastic can rebuild automatically or we create manually...?

Thanks

You probably need /var/lib/elasticsearch to exist with the correct permissions, because Elasticsearch shouldn't have the authority to create this folder if it's missing. But it shouldn't contain any files or subfolders the first time it starts up.

so what should i do ..? deleting file inside data path or what..?
i try restart, stop not working, how this happen...?

Thanks

Sorry, I can't really confirm that you should just delete some files on your system without knowing a lot more details about the situation you're in. It's really unclear how your system is set up, or how you got into this state, and deleting files is not something you can easily undo.

You need to make sure that the very first time a new Elasticsearch node starts up it has a data.path setting that is pointing to an empty directory. Making sure that the directory is empty is up to you.

1 Like

Hi DavidTurner,

its working now i just delete nodes inside /var/lib/elasticsearch
i build this using yum repository

Thanks your time for reply my question.

1 Like

Got the same issue after cloning a vmhost, tried stopping both nodes and removing all under data.path and then restarting both nodes, launching second node still causes this issue.

Would some data still be cached in other running master nodes somehow?

Anyway to inspect/modify cluster info through REST API?

Any hints appreciated, TIA

Hi @stefws, if you're still having issues after clearing out the data path then it's not the same issue as the one we're discussing in this thread. I suggest you open a new thread and share some logs showing the problems.

Looks the same to me:

[2019-07-11T09:05:04,913][INFO ][o.e.d.z.ZenDiscovery ] [es-mst2] failed to send join request to master [{d1r2n10}{ByigHkGrRMeYVtin7phKDw}{iL03BcseQNy7y-Qd1dRBRg}{<ip>}{<ip>:9300}{ml.machine_memory=67496067072, rack=OPA3.4.16, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[d1r2n10][<ip>:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {es-mst2}{AZngio9xQzqh0MHUzOa5iA}{gEFR_nWTTqup4kS-zjcAiQ}{<ip2>}{<ip2>:9300}{ml.machine_memory=4137517056, rack=OPA3.3.16, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, found existing node {es-mst1}{AZngio9xQzqh0MHUzOa5iA}{MFnNbFAGQjCnT3FmTgYGFg}{<ip3>}{<ip3>:9300}{ml.machine_memory=4137517056, rack=OPA3.3.16, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} with the same id but is a different node instance]; ]

It does look the same, yes. Both nodes have ID AZngio9xQzqh0MHUzOa5iA.This ID is stored in the data path and a new one is generated randomly if the data path is empty. I think this means your attempt to clear the data path was unsuccessful.

Hm, now it works, forget everything after last good morgen :slight_smile: Thanks!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.