ES under Docker Swarm fails with "found existing node with the same id but is a different node instance"

saultocsin · February 19, 2018, 7:45pm

Hello All,

Although I know my way (some) around Docker, I am very new to ES.

Some environmental information:

centos-release-7-4.1708.el7.centos.x86_64
Docker version 17.12.0-ce, build c97c6d6
ES Version: 5.6.7, Build: 4669214/2018-01-25T21:14:50.776Z, JVM: 1.8.0_151

Based on stuff I've read on the Web, it may be that the Dockerized version of ES (pulled via 'elasticsearch:5') is not the official ES version, but one maintained by Docker.

The Docker compose yml file looks like this:

`version: "3.3"

services:
elasticsearch:

command: >
  elasticsearch
  -E discovery.zen.ping.unicast.hosts=elasticsearch            
  -E discovery.zen.minimum_master_nodes=1
  -E node.max_local_storage_nodes=1
  -E network.host=0.0.0.0  

image: elasticsearch:5                                          # unofficial (from Docker) image; runs as user root
deploy:
  mode: replicated                                              
  endpoint_mode: dnsrr                                          
volumes:
  - type: volume
    source: nfs_share
    target: /usr/share/elasticsearch/data
    volume:
      nocopy: true

nginx:
image: 'nginx:1'
ports:
- target: 9200
published: 9200
protocol: tcp
mode: ingress
command: |
/bin/bash -c "echo '
server {
listen 9200;
add_header X-Frame-Options "SAMEORIGIN";
client_max_body_size 64M;

    location / {
        proxy_pass http://elasticsearch:9200;
        proxy_http_version 1.1;
        proxy_set_header Connection keep-alive;
        proxy_set_header Upgrade $$http_upgrade;
        proxy_set_header Host $$host;
        proxy_set_header X-Real-IP $$remote_addr;
        proxy_cache_bypass $$http_upgrade;
    }
  }' | tee /etc/nginx/conf.d/default.conf && nginx -g 'daemon off;'"

volumes:
nfs_share:
driver: local
driver_opts:
type: nfs
o: "addr=71.100.14.14,rsize=1048576,wsize=1048576,nolock,soft,rw,timeo=600,retrans=2"
device: ":/export/zfs/saul`

I start the stack via docker stack deploy --compose-file ./esnginx2.yml nfsnginx2

nginx (under service nfsnginx2_nginx) comes up on one of the 2 CentOS nodes; ES (under service nfsnginx2_elasticsearch) on the other. So far so good.

But when I issue docker service scale nfsnginx2_elasticsearch=2 the 2nd ES node is started but it never joins the ES cluster. Its log shows:

[2018-02-19T16:08:38,198][INFO ][o.e.d.z.ZenDiscovery ] [fb5lofA] failed to send join request to master [{fb5lofA}{fb5lofA-RPqtftVfhoKeiA}{JdGOhu_PRYe0m1FgqMw3Zw}{10.0.0.4}{10.0.0.4:9300}], reason [RemoteTransportException[[fb5lofA][10.0.0.4:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {fb5lofA}{fb5lofA-RPqtftVfhoKeiA}{zn7-jwGiSQef6_GcejrvQw}{10.0.0.17}{10.0.0.17:9300}, found existing node {fb5lofA}{fb5lofA-RPqtftVfhoKeiA}{JdGOhu_PRYe0m1FgqMw3Zw}{10.0.0.4}{10.0.0.4:9300} with the same id but is a different node instance]; ]

Some cursory reading on the Web seemed to indicate that that an ES "node UUID" is stored in the "ES data folder". Some folks said they got it to work by deleting the contents of the data folder, e.g., /var/lib/elasticsearch/nodes/0, and then restarting ES.

This doesn't seem like an especially clean solution; nor did it work for me. So I am wondering:

What is the actual cause of this error?
Is it possible that the ES image (again, perhaps not from ElasticSearch itself) contains a directory with a node UUID and that this is getting duplicated by the scale up and thus causing the error?
What are possible approaches to resolve this?

I would be most grateful for some help with this.

Thanks.

-Saul

system · March 19, 2018, 7:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch - major design flaw with node ID Elasticsearch docker	7	2104	March 20, 2022
Failed to send join request to master due to same id Elasticsearch	12	8355	March 30, 2017
Unable to create elasticsearch cluster in docker swarm Elasticsearch	10	3595	April 27, 2020
Failed to obtain node locks Elasticsearch [8.0.1] Elasticsearch docker	2	2686	April 6, 2022
Elasticsearch Cluster Addition of new node to existing cluster Elasticsearch	1	545	March 21, 2019

ES under Docker Swarm fails with "found existing node with the same id but is a different node instance"

Related topics