Adding Elasticsearch Node to ES Cluster

Attempting to add an Elasticsearch node to my current single node cluster. However, I am running into issues adding the new node.

I'm using Ansible to deploy the Elasticsearch Docker container so I'll just post the portions of the Ansible playbooks that are relevant. The current ES version I am using is 7.5.0 on both nodes.

Current ES Node:

- name: run elasticsearch docker container
  docker_container:
    name: elasticsearch
    image: "elasticsearch:{{ elk_version_tag }}"
    state: started
    restart_policy: unless-stopped
    user: 1000
    volumes:
    - /opt/elk-docker/elasticsearch/data:/usr/share/elasticsearch/data
    - /opt/elasticsearch/ssl:/usr/share/elasticsearch/config/ssl
    log_driver: "json-file"
    log_options:
      max-size: "200m"
      max-file: "3"
    ports:
    - 9200:9200
    - 9300:9300
    env:
      node.master: "true"
      http.host: "0.0.0.0"
      transport.host: "0.0.0.0"
      xpack.security.enabled: "true"
      xpack.monitoring.enabled: "true"
      cluster.routing.allocation.disk.threshold_enabled: "true"
      node.name: "elk-1"
      cluster.name: "elk"
      cluster.initial_master_nodes: "elk-1"
      ELASTIC_PASSWORD: "{{ xpack_password }}"
      ES_JAVA_OPTS: -Xms12g -Xmx12g
      xpack.security.http.ssl.enabled: "true"
      xpack.security.http.ssl.client_authentication: "optional"
      xpack.security.transport.ssl.client_authentication: "none"
      xpack.security.transport.ssl.enabled: "true"
      xpack.security.http.ssl.key: /usr/share/elasticsearch/config/ssl/elasticsearch.key
      xpack.security.http.ssl.certificate: /usr/share/elasticsearch/config/ssl/elasticsearch.pem
      xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/ssl/elasticsearch.key
      xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/ssl/elasticsearch.pem
    ulimits:
    - nofile:65536:65536

New Node:

- name: run elasticsearch docker container
  docker_container:
    name: elasticsearch
    image: "elasticsearch:{{ elasticsearch_version }}"
    state: started
    restart_policy: unless-stopped
    user: 1000
    volumes:
    - elasticsearch:/usr/share/elasticsearch/data
    - /opt/elasticsearch/ssl:/usr/share/elasticsearch/config/ssl
    log_driver: "json-file"
    log_options:
      max-size: "200m"
      max-file: "3"
    ports:
    - 9200:9200
    - 9300:9300
    env:
      http.host: "0.0.0.0"
      transport.host: "0.0.0.0"
      xpack.security.enabled: "true"
      xpack.monitoring.enabled: "true"
      xpack.security.http.ssl.enabled: "true"
      xpack.security.http.ssl.certificate: "/usr/share/elasticsearch/config/ssl/elasticsearch.pem"
      xpack.security.http.ssl.key: "/usr/share/elasticsearch/config/ssl/elasticsearch.key"
      xpack.security.transport.ssl.enabled: "true"
      xpack.security.transport.ssl.certificate: "/usr/share/elasticsearch/config/ssl/elasticsearch.pem"
      xpack.security.transport.ssl.key: "/usr/share/elasticsearch/config/ssl/elasticsearch.key"
      xpack.security.transport.ssl.verification_mode: "none"
      discovery.seed_hosts: "master.my.domain"
      node.name: "elk-2"
      cluster.name: "elk"
      cluster.initial_master_nodes: "elk-1"
      ELASTIC_PASSWORD: "{{ elastic_password }}"
      ES_JAVA_OPTS: "{{ es_java_opts }}"
    ulimits:
    - nofile:65536:65536
    - memlock:-1:-1

I substituted my current ES node's resolveable hostname for master.my.domain for privacy reasons. However, I know that the new node's Elasticsearch container is able to see the master node:

[elasticsearch@edb12d778337 ~]$ curl https://master.my.domain:9200
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":["Bearer realm=\"security\"","ApiKey","Basic realm=\"security\" charset=\"UTF-8\""]}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":["Bearer realm=\"security\"","ApiKey","Basic realm=\"security\" charset=\"UTF-8\""]}},"status":401}
[elasticsearch@edb12d778337 ~]$

(I didn't include credentials in the curl request because I'm just showing that the new node can see the master)

On the master node I can see traffic on port 9300 coming from my new node via tcpdump:

root@master:~# tcpdump -i ens160 host new-node.my.domain
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
10:00:18.414667 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [S], seq 810488902, win 29200, options [mss 1460,sackOK,TS val 551847975 ecr 0,nop,wscale 7], length 0
10:00:18.414930 IP master.my.domain.9300 > new-node.my.domain.60378: Flags [S.], seq 2321176096, ack 810488903, win 28960, options [mss 1460,sackOK,TS val 904464596 ecr 551847975,nop,wscale 7], length 0
10:00:18.415334 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [.], ack 1, win 229, options [nop,nop,TS val 551847976 ecr 904464596], length 0
10:00:18.417440 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [P.], seq 1:333, ack 1, win 229, options [nop,nop,TS val 551847978 ecr 904464596], length 332
10:00:18.417503 IP master.my.domain.9300 > new-node.my.domain.60378: Flags [.], ack 333, win 235, options [nop,nop,TS val 904464599 ecr 551847978], length 0
10:00:18.434308 IP master.my.domain.9300 > new-node.my.domain.60378: Flags [P.], seq 1:6858, ack 333, win 235, options [nop,nop,TS val 904464616 ecr 551847978], length 6857
10:00:18.434734 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [.], ack 6858, win 336, options [nop,nop,TS val 551847996 ecr 904464616], length 0
10:00:18.437061 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [P.], seq 333:525, ack 6858, win 336, options [nop,nop,TS val 551847998 ecr 904464616], length 192
10:00:18.437777 IP master.my.domain.9300 > new-node.my.domain.60378: Flags [P.], seq 6858:7009, ack 525, win 243, options [nop,nop,TS val 904464619 ecr 551847998], length 151
10:00:18.438789 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [P.], seq 525:710, ack 7009, win 358, options [nop,nop,TS val 551848000 ecr 904464619], length 185
10:00:18.439277 IP master.my.domain.9300 > new-node.my.domain.60378: Flags [P.], seq 7009:7413, ack 710, win 252, options [nop,nop,TS val 904464621 ecr 551848000], length 404
10:00:18.439703 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [P.], seq 710:750, ack 7413, win 381, options [nop,nop,TS val 551848001 ecr 904464621], length 40
10:00:18.439815 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [F.], seq 750, ack 7413, win 381, options [nop,nop,TS val 551848001 ecr 904464621], length 0
10:00:18.440010 IP master.my.domain.9300 > new-node.my.domain.60378: Flags [F.], seq 7413, ack 751, win 252, options [nop,nop,TS val 904464621 ecr 551848001], length 0
10:00:18.440152 IP new-node.my.domain.60378 > master.my.domain.9300: Flags [.], ack 7414, win 381, options [nop,nop,TS val 551848001 ecr 904464621], length 0
10:00:19.416700 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [S], seq 2278802906, win 29200, options [mss 1460,sackOK,TS val 551848977 ecr 0,nop,wscale 7], length 0
10:00:19.416813 IP master.my.domain.9300 > new-node.my.domain.60406: Flags [S.], seq 2901458393, ack 2278802907, win 28960, options [mss 1460,sackOK,TS val 904465598 ecr 551848977,nop,wscale 7], length 0
10:00:19.417032 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [.], ack 1, win 229, options [nop,nop,TS val 551848978 ecr 904465598], length 0
10:00:19.417909 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [P.], seq 1:333, ack 1, win 229, options [nop,nop,TS val 551848979 ecr 904465598], length 332
10:00:19.417962 IP master.my.domain.9300 > new-node.my.domain.60406: Flags [.], ack 333, win 235, options [nop,nop,TS val 904465599 ecr 551848979], length 0
10:00:19.418675 IP new-node.my.domain.53310 > master.my.domain.1029: Flags [.], seq 1893618506:1893622850, ack 3895652123, win 229, options [nop,nop,TS val 2240775860 ecr 2384323137], length 4344
10:00:19.418814 IP new-node.my.domain.53310 > master.my.domain.1029: Flags [P.], seq 4344:10133, ack 1, win 229, options [nop,nop,TS val 2240775860 ecr 2384323137], length 5789
10:00:19.419077 IP master.my.domain.1029 > new-node.my.domain.53310: Flags [.], ack 10133, win 6276, options [nop,nop,TS val 2384324559 ecr 2240775860], length 0
10:00:19.422954 IP master.my.domain.1029 > new-node.my.domain.53310: Flags [P.], seq 1:7, ack 10133, win 6276, options [nop,nop,TS val 2384324563 ecr 2240775860], length 6
10:00:19.423091 IP new-node.my.domain.53310 > master.my.domain.1029: Flags [.], ack 7, win 229, options [nop,nop,TS val 2240775864 ecr 2384324563], length 0
10:00:19.436556 IP master.my.domain.9300 > new-node.my.domain.60406: Flags [P.], seq 1:6858, ack 333, win 235, options [nop,nop,TS val 904465618 ecr 551848979], length 6857
10:00:19.436962 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [.], ack 6858, win 336, options [nop,nop,TS val 551848998 ecr 904465618], length 0
10:00:19.440869 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [P.], seq 333:525, ack 6858, win 336, options [nop,nop,TS val 551849002 ecr 904465618], length 192
10:00:19.441765 IP master.my.domain.9300 > new-node.my.domain.60406: Flags [P.], seq 6858:7009, ack 525, win 243, options [nop,nop,TS val 904465623 ecr 551849002], length 151
10:00:19.443158 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [P.], seq 525:710, ack 7009, win 358, options [nop,nop,TS val 551849004 ecr 904465623], length 185
10:00:19.443528 IP master.my.domain.9300 > new-node.my.domain.60406: Flags [P.], seq 7009:7413, ack 710, win 252, options [nop,nop,TS val 904465625 ecr 551849004], length 404
10:00:19.443914 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [P.], seq 710:750, ack 7413, win 381, options [nop,nop,TS val 551849005 ecr 904465625], length 40
10:00:19.444025 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [F.], seq 750, ack 7413, win 381, options [nop,nop,TS val 551849005 ecr 904465625], length 0
10:00:19.444302 IP master.my.domain.9300 > new-node.my.domain.60406: Flags [F.], seq 7413, ack 751, win 252, options [nop,nop,TS val 904465626 ecr 551849005], length 0
10:00:19.444427 IP new-node.my.domain.60406 > master.my.domain.9300: Flags [.], ack 7414, win 381, options [nop,nop,TS val 551849005 ecr 904465626], length 0

The logs I am getting on the new node show this error:

{"type": "server", "timestamp": "2020-03-17T15:25:40,353Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elk", "node.name": "elk-2", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elk-1] to bootstrap a cluster: have discovered [{elk-2}{M01ZLrZnQDiUQrojiPvAiQ}{RXkJwksNTsGHiJ5DclyR6g}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=16820195328, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [1xx.1xx.254.3:9300] from hosts providers and [{elk-2}{M01ZLrZnQDiUQrojiPvAiQ}{RXkJwksNTsGHiJ5DclyR6g}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=16820195328, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

(I have omitted the public IP from this log, the one ending in 254.3 -- the log shows the correct IP for my master host)

I don't see any error logs in Elasticsearch on my current master that seem to be relevant to this issue.

What am I doing wrong? Why can't my new ES node join the current single node cluster?

Any help would be greatly appreciated :slight_smile:

You should not set cluster.initial_master_nodes when adding a node to an existing cluster.

This node has not discovered elk-1 and I suspect that the addresses are wrong, since this node thinks its address is 172.17.0.2 which is very different from the discovery address of 1xx.1xx.254.3. Make sure you have set network.host to the right address or interface.


Edit: I see you're setting transport.host: 0.0.0.0 and http.host: 0.0.0.0 instead of network.host. I think it's simpler to set network.host instead, unless you want to listen for HTTP on different interfaces from transport connections. Also 0.0.0.0 tells Elasticsearch to use one of your local addresses for its internal purposes, but doesn't say which one. I think you need to be more specific here and tell it which address or interface to use.

@DavidTurner thanks for the quick response, wow that's awesome!

So I changed my Ansible playbook for both hosts to have the network.host parameter (setting the IP addresses that they will use to communicate with each other) and I removed cluster.initial_master_nodes from the new node and I still seem to have the same issue:

{"type": "server", "timestamp": "2020-03-17T16:36:09,082Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "elk", "node.name": "elk-2", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{elk-2}{M01ZLrZnQDiUQrojiPvAiQ}{fXQU7CNXQaemWpEvWWZ3Ew}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=16820195328, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [140.197.254.3:9300] from hosts providers and [{elk-2}{M01ZLrZnQDiUQrojiPvAiQ}{fXQU7CNXQaemWpEvWWZ3Ew}{172.17.0.2}{172.17.0.2:9300}{dilm}{ml.machine_memory=16820195328, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

Do I need to also change the values of the following configuration parameters to equal the IP addresses of the ES hosts:

  http.host: "0.0.0.0"
  transport.host: "0.0.0.0"

I think you should remove transport.host and http.host. They default to network.host, unless you set them explicitly, and that's the problem you're facing.

That fixed it! Thank you SO MUCH for your suggestions.

To summarize the solution, I updated my Ansible playbooks by removing the following in both my existing ES host and the new ES host:

http.host: "0.0.0.0"
transport.host: "0.0.0.0"

Replacing with network.host and setting the value of network.host to the FQDN's of each of my hosts. In order to bind to the FQDN of my host I also had to modify my playbook to allow the ES Docker containers to use network_mode: host and remove the port bindings.

The full Ansible config for each host looks like this

Current ES Node:

- name: run elasticsearch docker container
  docker_container:
    name: elasticsearch
    image: "elasticsearch:{{ elk_version_tag }}"
    state: started
    restart_policy: unless-stopped
    network_mode: host
    user: 1000
    volumes:
    - /opt/elk-docker/elasticsearch/data:/usr/share/elasticsearch/data
    - /opt/elasticsearch/ssl:/usr/share/elasticsearch/config/ssl
    log_driver: "json-file"
    log_options:
      max-size: "200m"
      max-file: "3"
    env:
      node.master: "true"
      network.host: "master.my.domain"
      xpack.security.enabled: "true"
      xpack.monitoring.enabled: "true"
      cluster.routing.allocation.disk.threshold_enabled: "true"
      node.name: "elk-1"
      cluster.name: "elk"
      cluster.initial_master_nodes: "elk-1"
      ELASTIC_PASSWORD: "{{ xpack_password }}"
      ES_JAVA_OPTS: -Xms12g -Xmx12g
      xpack.security.http.ssl.enabled: "true"
      xpack.security.http.ssl.client_authentication: "optional"
      xpack.security.transport.ssl.client_authentication: "none"
      xpack.security.transport.ssl.enabled: "true"
      xpack.security.http.ssl.key: /usr/share/elasticsearch/config/ssl/elasticsearch.key
      xpack.security.http.ssl.certificate: /usr/share/elasticsearch/config/ssl/elasticsearch.pem
      xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/ssl/elasticsearch.key
      xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/ssl/elasticsearch.pem
      #xpack.security.http.ssl.certificate_authorities: /usr/share/elasticsearch/config/ssl/elasticsearch-ca.pem
      #xpack.security.transport.ssl.certificate_authorities: /usr/share/elasticsearch/config/ssl/elasticsearch-ca.pem
    ulimits:
    - nofile:65536:65536

New Node:

- name: run elasticsearch docker container
  docker_container:
    name: elasticsearch
    image: "elasticsearch:{{ elasticsearch_version }}"
    state: started
    restart_policy: unless-stopped
    network_mode: host
    user: 1000
    volumes:
    - elasticsearch:/usr/share/elasticsearch/data
    - /opt/elasticsearch/ssl:/usr/share/elasticsearch/config/ssl
    log_driver: "json-file"
    log_options:
      max-size: "200m"
      max-file: "3"
    env:
      network.host: "new-node.my.domain"
      xpack.security.enabled: "true"
      xpack.monitoring.enabled: "true"
      xpack.security.http.ssl.enabled: "true"
      xpack.security.http.ssl.certificate: "/usr/share/elasticsearch/config/ssl/elasticsearch.pem"
      xpack.security.http.ssl.key: "/usr/share/elasticsearch/config/ssl/elasticsearch.key"
      xpack.security.transport.ssl.enabled: "true"
      xpack.security.transport.ssl.certificate: "/usr/share/elasticsearch/config/ssl/elasticsearch.pem"
      xpack.security.transport.ssl.key: "/usr/share/elasticsearch/config/ssl/elasticsearch.key"
      xpack.security.transport.ssl.verification_mode: "none"
      discovery.seed_hosts: "master.my.domain"
      node.name: "{{ node_name }}"
      cluster.name: "elk"
      ELASTIC_PASSWORD: "{{ elastic_password }}"
      ES_JAVA_OPTS: "{{ es_java_opts }}"
    ulimits:
    - nofile:65536:65536
    - memlock:-1:-1

Looks good.

If your nodes all have the same interface name, e.g. eth0, then you can use this rather than the FQDN: network.host: _eth0_.

1 Like