Cannot add a node to cluster on remote host

yk928 · April 18, 2020, 2:08am

Hi,

I've been running an Elasticsearch instance using Docker and docker-compose on host A.
The configs are

version: '2.2'
services:
  es01:
    build: .
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - cluster.initial_master_nodes=es01
      - network.host=0.0.0.0
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - ./data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elastic

networks:
  elastic:
    driver: bridge

and

FROM docker.elastic.co/elasticsearch/elasticsearch:7.6.2
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-icu
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-kuromoji

This is up and running successfully. I've already added several documents and indices.

GET /_cluster/health?pretty
{
  "cluster_name" : "es-docker-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 16,
  "active_shards" : 16,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 13,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 55.172413793103445
}

What I want to do is add a node to this cluster which is running on another host B.
So I set up host B and tried to join a node to the cluster.

version: '2.2'
services:
  es02:
    build: .
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=10.146.0.2
      - cluster.initial_master_nodes=es01
      - network.host=0.0.0.0
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - ./data02:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - elastic

networks:
  elastic:
    driver: bridge

and the same Dockerfile as host A.

But the discovery process never ends and the node cannot join the cluster.

{"type": "server", "timestamp": "2020-04-17T13:18:32,258Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es02", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es01] to bootstrap a cluster: have discovered [{es02}{k5uDRt9aQCqCf-sVlGmQ8A}{_Miq00-CT6igXR5lFgzt6g}{172.19.0.2}{172.19.0.2:9300}{dilm}{ml.machine_memory=4148080640, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [10.146.0.2:9300] from hosts providers and [{es02}{k5uDRt9aQCqCf-sVlGmQ8A}{_Miq00-CT6igXR5lFgzt6g}{172.19.0.2}{172.19.0.2:9300}{dilm}{ml.machine_memory=4148080640, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

Looks like the docker container on host B reaches the docker container on host A.

$ sudo docker exec -it es02 sh
sh-4.2# curl 10.146.0.2:9200
{
  "name" : "es01",
  "cluster_name" : "es-docker-cluster",
  "cluster_uuid" : "mAXf03ZXReS6Gun1HyMGsg",
  "version" : {
    "number" : "7.6.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Also, when I change the cluster.name on the ES instance on the docker container on host A, the error message changes, which means the ES instance on the docker container on host B recognizes host A, so the network seems to be configured correctly.

What can I do?

yk928 · April 27, 2020, 12:04pm

Any hint would be appreciated.

DavidTurner · April 27, 2020, 2:17pm

Sounds like a connectivity issue, but I think that some improvements to logging coming in 7.7.0 will help diagnose the problem better here. This looks wrong:

Quoting these docs:

Bridge networks apply to containers running on the same Docker daemon host. For communication among containers running on different Docker daemon hosts, you can either manage routing at the OS level, or you can use an overlay network.

Also your nodes do not seem to be on the same network as they have wildly different IP addresses, 172.19.0.2 vs 10.146.0.2.

yk928 · April 29, 2020, 7:00am

Thanks for your reply.

Also your nodes do not seem to be on the same network as they have wildly different IP addresses, 172.19.0.2 vs 10.146.0.2 .

Yeah, but does it make difference?
As can be seen in the last code example, the es02 node should reach 10.146.0.2.

I also tried removing driver: bridge, with no luck.

yk928 · April 29, 2020, 7:12am

Finally I've made it.
Adding 'network.publish_host=[local IP address of each host itself]' to each container's config was the key.

system · May 27, 2020, 7:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSeach cluster - Error while adding new node to cluster Elasticsearch	1	426	May 25, 2018
Elasticsearch cluster in two different hosts with one node as Docker container and another as Service Elasticsearch docker	1	558	September 11, 2019
Nodes can't join the cluster Elasticsearch docker	8	1717	June 29, 2022
Elasticsearch docker cluster Elasticsearch docker	1	195	September 19, 2023
Add docker container to outside network Elasticsearch docker	4	404	June 21, 2019

Cannot add a node to cluster on remote host

Related topics