Failed to send join request to master due to same id

I use template created two elasticsearch instances on two aws ec2 instance with the same docker container (5.2) and data folder. However the elasticsearch instances failed to join a cluster and the error message is:

failed to send join request to master [{2-mHh6b}{2-mHh6bOQDm-X8vgtnFteA}{GgBr8OgjRBKEDG00adq4qA}{172.31.1.28}{172.31.1.28:9300}], reason [RemoteTransportException[[2-mHh6b][172.31.1.28:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {ZoX1MdS}{ZoX1MdSASkCu8omyUdXNhg}{jJkHlyqEQB-i3a03CP2yIw}{172.31.7.117}{172.31.7.117:9300}, found existing node {ZoX1MdS}{ZoX1MdSASkCu8omyUdXNhg}{aeBGGGrWR72YO_P_Dyx9Cg}{172.31.9.253}{172.31.9.253:9300} with the same id but is a different node instance]; ]
[2017-03-01T03:44:38,084][WARN ][o.e.t.n.Netty4Transport ] [ZoX1MdS] exception caught on transport layer [[id: 0x764a7b20, L:/172.31.7.117:58318 - R:/172.31.13.77:9300]], closing connection

I understand I probably should let elasticsearch do the copy for data folder but copying data folder would cost time and it means I can no longer use template to create multiple ec2 instances. Is there anyway to get around this? Maybe changing the node id is a solution but I couldn't find any document about how to change a node id.

Thanks.

You cannot do this.

Hi Warkolm,

Thanks. The instances are able to form a cluster in version 1.6, however it's not doable in version 5.2, is there another way to get around this?

Nope.
Why do you want to do this anyway?

Hi Warkolm,

Because I need to create a cluster by creating multiple Elasticsearch instances from the same AMI image from aws, I'm using this cluster as a data processing engine and it will be destroyed right after. I have to repeat this process from time to time.

Then why not install ES but not start it, so it's essentially a blank slate before you clone it.

Okay that make sense, but how can I revert my data folder to blank slate? I'm assuming the data folder is contaminated as soon as one instance started on it?

Correct.
Check out https://www.elastic.co/guide/en/elasticsearch/reference/5.2/install-elasticsearch.html for the various directories, depending on how you installed ES.

Hi Warkolm,

I had gone through the documentation but couldn't find any useful information on how to create a blank slate data folder. Basically I'm using the official docker container and an external data folder to start elastcsearch instance. Is there any file in data folder that can be removed so it will become a blank slate?

Thank you.

Delete anything in the data directory.

But the data directory is inside my AWS AMI, I normally create multiple instances based on this AMI. If I delete the data directory, the AMI will become an ES template without any data.
Or do I need to delete the data directory after the instances been created and only leave one of them intact?

Thanks.

Just to give you a bit more background. So our infrastructure is AWS and we use Elasticsearch as data processing engine. The way how it work is like this:

We have a pre-build AMI with Elasticsearch official Docker container installed, an data folder(around 250G data) inside the AMI will be mount to Elasticsearch data folder when container starts, Below is my docker compose file.

es:
image: ci.insightds.com.au:5001/elasticsearch-avm-es
environment:
- bootstrap.memory_lock=true
- http.host=0.0.0.0
- network.host=site
- "ES_JAVA_OPTS=-Xms4g -Xmx4g"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- /data:/usr/share/elasticsearch/data
ports:
- 9200:9200
net: host

As soon as we start our process, 5 ec2 instances of the pre-build AMI will be created, 3 of them are pure ES instances without data folder (so the docker compose file is the same as above but without volumes been mounted), 2 of them are normal ES instances with data folder. In Elasticsearch version 1.6, these five instances would form a cluster. However after I upgraded the Elasticsearch to 5.2, one of the data node stopped joining the other 4 nodes as a cluster, Below is the error message:


failed to send join request to master [{2-mHh6b}{2-mHh6bOQDm-X8vgtnFteA}{GgBr8OgjRBKEDG00adq4qA}{172.31.1.28}{172.31.1.28:9300}], reason [RemoteTransportException[[2-mHh6b][172.31.1.28:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {ZoX1MdS}{ZoX1MdSASkCu8omyUdXNhg}{jJkHlyqEQB-i3a03CP2yIw}{172.31.7.117}{172.31.7.117:9300}, found existing node {ZoX1MdS}{ZoX1MdSASkCu8omyUdXNhg}{aeBGGGrWR72YO_P_Dyx9Cg}{172.31.9.253}{172.31.9.253:9300} with the same id but is a different node instance]; ]
[2017-03-01T03:44:38,084][WARN ][o.e.t.n.Netty4Transport ] [ZoX1MdS] exception caught on transport layer [[id: 0x764a7b20, L:/172.31.7.117:58318 - R:/172.31.13.77:9300]], closing connection

I really don't want to go down the path which is to delete the data folder from one of the data node so Elasticsearch will copy the data folder from the other data node. Because it will take a long time to copy 250G of data over the network. Is there any other ways to get around with it?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.