Hello All,
Although I know my way (some) around Docker, I am very new to ES.
Some environmental information:
centos-release-7-4.1708.el7.centos.x86_64
Docker version 17.12.0-ce, build c97c6d6
ES Version: 5.6.7, Build: 4669214/2018-01-25T21:14:50.776Z, JVM: 1.8.0_151
Based on stuff I've read on the Web, it may be that the Dockerized version of ES (pulled via 'elasticsearch:5') is not the official ES version, but one maintained by Docker.
The Docker compose yml file looks like this:
`version: "3.3"
services:
elasticsearch:
command: >
elasticsearch
-E discovery.zen.ping.unicast.hosts=elasticsearch
-E discovery.zen.minimum_master_nodes=1
-E node.max_local_storage_nodes=1
-E network.host=0.0.0.0
image: elasticsearch:5 # unofficial (from Docker) image; runs as user root
deploy:
mode: replicated
endpoint_mode: dnsrr
volumes:
- type: volume
source: nfs_share
target: /usr/share/elasticsearch/data
volume:
nocopy: true
nginx:
image: 'nginx:1'
ports:
- target: 9200
published: 9200
protocol: tcp
mode: ingress
command: |
/bin/bash -c "echo '
server {
listen 9200;
add_header X-Frame-Options "SAMEORIGIN";
client_max_body_size 64M;
location / {
proxy_pass http://elasticsearch:9200;
proxy_http_version 1.1;
proxy_set_header Connection keep-alive;
proxy_set_header Upgrade $$http_upgrade;
proxy_set_header Host $$host;
proxy_set_header X-Real-IP $$remote_addr;
proxy_cache_bypass $$http_upgrade;
}
}' | tee /etc/nginx/conf.d/default.conf && nginx -g 'daemon off;'"
volumes:
nfs_share:
driver: local
driver_opts:
type: nfs
o: "addr=71.100.14.14,rsize=1048576,wsize=1048576,nolock,soft,rw,timeo=600,retrans=2"
device: ":/export/zfs/saul`
I start the stack via docker stack deploy --compose-file ./esnginx2.yml nfsnginx2
nginx (under service nfsnginx2_nginx) comes up on one of the 2 CentOS nodes; ES (under service nfsnginx2_elasticsearch) on the other. So far so good.
But when I issue docker service scale nfsnginx2_elasticsearch=2 the 2nd ES node is started but it never joins the ES cluster. Its log shows:
[2018-02-19T16:08:38,198][INFO ][o.e.d.z.ZenDiscovery ] [fb5lofA] failed to send join request to master [{fb5lofA}{fb5lofA-RPqtftVfhoKeiA}{JdGOhu_PRYe0m1FgqMw3Zw}{10.0.0.4}{10.0.0.4:9300}], reason [RemoteTransportException[[fb5lofA][10.0.0.4:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {fb5lofA}{fb5lofA-RPqtftVfhoKeiA}{zn7-jwGiSQef6_GcejrvQw}{10.0.0.17}{10.0.0.17:9300}, found existing node {fb5lofA}{fb5lofA-RPqtftVfhoKeiA}{JdGOhu_PRYe0m1FgqMw3Zw}{10.0.0.4}{10.0.0.4:9300} with the same id but is a different node instance]; ]
Some cursory reading on the Web seemed to indicate that that an ES "node UUID" is stored in the "ES data folder". Some folks said they got it to work by deleting the contents of the data folder, e.g., /var/lib/elasticsearch/nodes/0, and then restarting ES.
This doesn't seem like an especially clean solution; nor did it work for me. So I am wondering:
- What is the actual cause of this error?
- Is it possible that the ES image (again, perhaps not from ElasticSearch itself) contains a directory with a node UUID and that this is getting duplicated by the scale up and thus causing the error?
- What are possible approaches to resolve this?
I would be most grateful for some help with this.
Thanks.
-Saul