Elasticsearch masters not discovering each other on Docker Swarm

I am trying to set up an elasticsearch stack using docker swarm (Don't really need swarm functionality, but the .yml has been written now)

The problem I seem to be getting is that when i start up the stack, the two masters can't resolve eachother (Despite inspecting to confirm they're on the same network)
and it may be the fault of discovery.seed_hosts providing an empty list to search with. The reason I suspect this, is because of these lines in both of their logs

 "message":"publish_address {10.0.120.16:9300}, bound_addresses {0.0.0.0:9300}"
 "message":"bound or publishing to a non-loopback address, enforcing bootstrap checks",
 "message":"failed to resolve host [es-master2]"
 "message":"master not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover master-eligible nodes [es-master1, es-master2] to bootstrap a cluster: have discovered [{es-master1}{SmILpuyqTGCkzzG5KmMIRg}{ru7wfhJTRkSXitkT5Ubhgw}{10.0.115.6}{10.0.115.6:9300}{cdfhilmrstw}]; discovery will continue using [] from hosts providers and [{es-master1}{SmILpuyqTGCkzzG5KmMIRg}{ru7wfhJTRkSXitkT5Ubhgw}{10.0.115.6}{10.0.115.6:9300}{cdfhilmrstw}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

I'm not sure if the "discovery will continue using [ ] from hosts providers" part means that something went wrong with the discovery.seed_hosts setting. I suspect it because without the setting it'll be replaced with [127.0.0.1:9300,... etc]

Here's the parts of my docker compose that's relevant. It's part of a bigger file, but right now i just need the two masters to talk.

.
.
.
networks:
  es-internal:
    driver: overlay
  es-connection:
    driver: overlay
  external:
    driver: overlay
    name: external
.
.
.
  es-coordination1:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
    hostname: es-coordination1
    environment:
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - node.name=es-coordination1
      - cluster.name=${CLUSTER_NAME}
      - network.host=0.0.0.0
      - discovery.seed_hosts=es-master1,es-master2
      - cluster.initial_master_nodes=es-master1,es-master2
      - node_role=""
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-coordination1.key
      - xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-coordination1.crt
      - xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
      - xpack.security.http.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-coordination1.key
      - xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-coordination1.crt
      - xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}
    networks:
      - es-internal
      - es-connection
    ports:
      - target: 9200
        published: 9200
        protocol: tcp
        mode: host
    volumes:
      - ./data/es-coordination1:/usr/share/elasticsearch/data
    secrets:
      - source: ca-crt
        target: /usr/share/elasticsearch/config/certs/ca.crt
      - source: es-coordination1-crt
        target: /usr/share/elasticsearch/config/certs/es-coordination1.crt
      - source: es-coordination1-key
        target: /usr/share/elasticsearch/config/certs/es-coordination1.key
    configs:
      - source: jvm-coordination
        target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
    deploy:
      endpoint_mode: dnsrr
      mode: "replicated"
      replicas: 1
      resources:
        limits:
          memory: 1G
    healthcheck:
      test: curl -fs http://localhost:9200/_cat/health || exit 1
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 45s

  es-master1:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
    hostname: es-master1
    environment:
      - node.name=es-master1
      - cluster.name=${CLUSTER_NAME}
      - node_role="master,ingest"
      - cluster.initial_master_nodes=es-master1,es-master2
      - discovery.seed_hosts=es-master2
#      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-master1.key
      - xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master1.crt
      - xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
      - xpack.security.http.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-master1.key
      - xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master1.crt
      - xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
    networks:
      - es-internal
    volumes:
      - ./data/es-master1:/usr/share/elasticsearch/data
    secrets:
      - source: ca-crt
        target: /usr/share/elasticsearch/config/certs/ca.crt
      - source: es-master1-crt
        target: /usr/share/elasticsearch/config/certs/es-master1.crt
      - source: es-master1-key
        target: /usr/share/elasticsearch/config/certs/es-master1.key
    configs:
      - source: jvm-coordination
        target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
    deploy:
      endpoint_mode: dnsrr
      mode: "replicated"
      replicas: 1
      resources:
        limits:
          memory: 1G
    depends_on:
      - es-coordination
    healthcheck:
      test: curl -fs http://localhost:9200/_cat/health || exit 1
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 45s

  es-master2:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.2.3
    hostname: es-master2
    environment:
      - node.name=es-master2
      - cluster.name=${CLUSTER_NAME}
      - node_role="master,ingest"
      - cluster.initial_master_nodes=es-master1,es-master2
      - discovery.seed_hosts=es-master1
#      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=/usr/share/elasticsearch/config/certs/es-master2.key
      - xpack.security.http.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master2.crt
      - xpack.security.http.ssl.certificate_authorities/usr/share/elasticsearch/config/certs/ca.crt
      - xpack.security.http.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=/usr/share/elasticsearch/config/certs/es-master2.key
      - xpack.security.transport.ssl.certificate=/usr/share/elasticsearch/config/certs/es-master2.crt
      - xpack.security.transport.ssl.certificate_authorities=/usr/share/elasticsearch/config/certs/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
    networks:
      - es-internal
    volumes:
      - ./data/es-master2:/usr/share/elasticsearch/data
    secrets:
      - source: ca-crt
        target: /usr/share/elasticsearch/config/certs/ca.crt
      - source: es-master2-crt
        target: /usr/share/elasticsearch/config/certs/es-master2.crt
      - source: es-master2-key
        target: /usr/share/elasticsearch/config/certs/es-master2.key
    configs:
      - source: jvm-coordination
        target: /usr/share/elasticsearch/config/jvm.options.d/jvm-coordination
    deploy:
      endpoint_mode: dnsrr
      mode: "replicated"
      replicas: 1
      resources:
        limits:
          memory: 1G
    depends_on:
      - es-coordination
    healthcheck:
      test: curl -fs http://localhost:9200/_cat/health || exit 1
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 45s
.
.
.

If anyone has some input it would be great!

Any takers? It's the communication between es-master1 and es-master2 I am trying to fix.

es-coordination is there in case someone might recognize a problem with having an internal network for other nodes.

Yes, that plus failed to resolve host [es-master2]. This node has no idea how to contact the other node.

Do you see any reason for it? Is it just the environment variables being buggy perhaps?

~WRD000.jpg

You need to work out why the name es-master2 can't be resolved to the right IP address. Maybe it's DNS, maybe something else in your environment. Unfortunately this is more of a general sysadmin question than anything to do with Elasticsearch so I don't think we can be of much help here.

It does not seem like a docker problem, I can make a swarm service with 2 containers that has iputils installed and they ping each other via hostname no problem. All ports on a overlay network should be opened to the internal by default also.

~WRD000.jpg

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.