Cannot add a node to cluster on remote host

Hi,

I've been running an Elasticsearch instance using Docker and docker-compose on host A.
The configs are

version: '2.2'
services:
  es01:
    build: .
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - cluster.initial_master_nodes=es01
      - network.host=0.0.0.0
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - ./data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elastic

networks:
  elastic:
    driver: bridge

and

FROM docker.elastic.co/elasticsearch/elasticsearch:7.6.2
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-icu
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-kuromoji

This is up and running successfully. I've already added several documents and indices.

GET /_cluster/health?pretty
{
  "cluster_name" : "es-docker-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 16,
  "active_shards" : 16,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 13,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 55.172413793103445
}

What I want to do is add a node to this cluster which is running on another host B.
So I set up host B and tried to join a node to the cluster.

version: '2.2'
services:
  es02:
    build: .
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=10.146.0.2
      - cluster.initial_master_nodes=es01
      - network.host=0.0.0.0
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - ./data02:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elastic

networks:
  elastic:
    driver: bridge

and the same Dockerfile as host A.

But the discovery process never ends and the node cannot join the cluster.

{"type": "server", "timestamp": "2020-04-17T13:18:32,258Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es02", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es01] to bootstrap a cluster: have discovered [{es02}{k5uDRt9aQCqCf-sVlGmQ8A}{_Miq00-CT6igXR5lFgzt6g}{172.19.0.2}{172.19.0.2:9300}{dilm}{ml.machine_memory=4148080640, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.146.0.2:9300] from hosts providers and [{es02}{k5uDRt9aQCqCf-sVlGmQ8A}{_Miq00-CT6igXR5lFgzt6g}{172.19.0.2}{172.19.0.2:9300}{dilm}{ml.machine_memory=4148080640, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

Looks like the docker container on host B reaches the docker container on host A.

$ sudo docker exec -it es02 sh
sh-4.2# curl 10.146.0.2:9200
{
  "name" : "es01",
  "cluster_name" : "es-docker-cluster",
  "cluster_uuid" : "mAXf03ZXReS6Gun1HyMGsg",
  "version" : {
    "number" : "7.6.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Also, when I change the cluster.name on the ES instance on the docker container on host A, the error message changes, which means the ES instance on the docker container on host B recognizes host A, so the network seems to be configured correctly.

What can I do?

Any hint would be appreciated.

Sounds like a connectivity issue, but I think that some improvements to logging coming in 7.7.0 will help diagnose the problem better here. This looks wrong:

Quoting these docs:

Bridge networks apply to containers running on the same Docker daemon host. For communication among containers running on different Docker daemon hosts, you can either manage routing at the OS level, or you can use an overlay network.

Also your nodes do not seem to be on the same network as they have wildly different IP addresses, 172.19.0.2 vs 10.146.0.2.

Thanks for your reply.

Also your nodes do not seem to be on the same network as they have wildly different IP addresses, 172.19.0.2 vs 10.146.0.2 .

Yeah, but does it make difference?
As can be seen in the last code example, the es02 node should reach 10.146.0.2.

I also tried removing driver: bridge, with no luck.

Finally I've made it.
Adding 'network.publish_host=[local IP address of each host itself]' to each container's config was the key.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.