New Elasticsearch node does not sync with existing cluster

runchranda · April 1, 2021, 8:04am

Hi, I previously had 3 elasticsearch nodes consisting of node 1, node 2 and node 3, but my node 2 encountered some OS related issues so it had to be formatted.

So I reinstalled and reconfigured node 2 as before it is formatted but somehow node 2 is not joining the existing cluster and formed a new elasticsearch cluster instead. All my elasticsearch nodes are using the same version which is 7.12.0.

When I print out list of nodes from the primary/master node using curl, its only showing node 1 and node 3:

[root@node-1 ~]# curl -k --user elastic -X GET "https://node-1:9200/_cat/nodes?pretty"
Enter host password for user 'elastic':
192.168.23.90 44 99 4 0.00 0.03 0.02 cdfhilmrstw * node-1
192.168.23.92 52 98 1 0.00 0.03 0.05 cdfhilmrstw - node-3
[root@node-1 ~]#

I am able to view cluster status for node 1:

[root@node-1 ~]# curl -k --user elastic -H 'Content-Type: application/json' -XGET https://node-1:9200/_cluster/health?pretty
Enter host password for user 'elastic':
{
  "cluster_name" : "umtcentral-log",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 914,
  "active_shards" : 1820,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

and also node 3:

[root@node-1 ~]# curl -k --user elastic -H 'Content-Type: application/json' -XGET https://node-3:9200/_cluster/health?pretty
    Enter host password for user 'elastic':
    {
      "cluster_name" : "umtcentral-log",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 2,
      "number_of_data_nodes" : 2,
      "active_primary_shards" : 914,
      "active_shards" : 1820,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }
    [root@node-1 ~]#

when i check cluster status for node 2 it says "no route to host" instead:

[root@node-1 ~]# curl -k --user elastic -H 'Content-Type: application/json' -XGET https://node-2:9200/_cluster/health?pretty
Enter host password for user 'elastic':
curl: (7) Failed connect to node-2:9200; No route to host

but what confused me is that, when i check cluster status directly from node 2 and not from the existing primary node, i am able to get a reply, not a "no route to host":

[root@node-2 ~]# curl -k --user elastic -H 'Content-Type: application/json' -XGET https://node-2:9200/_cluster/health?pretty
Enter host password for user 'elastic':
{
  "cluster_name" : "umtcentral-log",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 5,
  "active_shards" : 5,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
[root@node-2 ~]#

Below are snippets of my elasticsearch.yml file for node 1:

# ---------------------------------- Cluster -----------------------------------
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: umtcentral-log

# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: node-1
node.master: true

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: "0.0.0.0"
network.bind_host: "0.0.0.0"
network.publish_host: "0.0.0.0"

#
# Set a custom port for HTTP:
#
#http.port: 9200
http.host: "0.0.0.0"

# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.seed_hosts: ["node-1", "node-2", node-3"]
discovery.zen.minimum_master_nodes: 2

#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

elasticsearch.yml for node 2:

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: umtcentral-log
node.data: true

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-2
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true

#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: "0.0.0.0"
network.bind_host: "0.0.0.0"
network.publish_host: "0.0.0.0"

#
# Set a custom port for HTTP:
#
#http.port: 9200
http.host: "0.0.0.0"

# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.seed_hosts: ["node-1", "node-2", "node-3"]
discovery.zen.minimum_master_nodes: 2

#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

lastly elasticsearch.yml for node 3:

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: umtcentral-log
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-3
node.name: node-3
node.data: true

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true

# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: "0.0.0.0"
network.bind_host: "0.0.0.0"
network.publish_host: "0.0.0.0"

#
# Set a custom port for HTTP:
#
#http.port: 9200
http.host: "0.0.0.0"

# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.seed_hosts: ["node-1", "node-2", "node-3"]
discovery.zen.minimum_master_nodes: 2

#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

I have had these issues for days and tried to troubleshoot by referring to the official documentations, forums and stuffs but its still a dead end for me.

What are the steps that I missed out to make sure all my elasticsearch nodes are in sync and be in the same cluster? I could really use some help here, please. Thanks in advanced.

DavidTurner · April 1, 2021, 9:42am

The most likely explanation is that you auto-bootstrapped it. As per the docs at the bottom of that page, you will need to wipe it and start again with a corrected config.

You should also delete the following options from your config files:

network.bind_host (docs)
network.publish_host (docs)
http.host (docs)
discovery.zen.minimum_master_nodes (deprecated and ignored)
cluster.initial_master_nodes (docs)

runchranda · April 1, 2021, 1:55pm

Hi, thanks for the reply. By this, what exactly that is needed to be wiped out and on which nodes? I'm sorry, I don't quite understand here.

Also, does removing all the parameters above could prevent Elasticsearch from auto-bootstrapping? and do i remove it only in node 2 or all nodes?

Thanks in advanced.

DavidTurner · April 1, 2021, 3:20pm

discovery.seed_hosts will prevent auto-bootstrapping, you must have not had this the first time you ran the new node.

The data path on the new node.

runchranda · April 2, 2021, 2:43am

I have commented out these paramaters and wiped the data on the new nodes but when I start Elasticsearch I got these errors:

[2021-04-02T10:06:28,724][INFO ][o.e.n.Node               ] [node-2] initialized
[2021-04-02T10:06:28,725][INFO ][o.e.n.Node               ] [node-2] starting ...
[2021-04-02T10:06:28,737][INFO ][o.e.x.s.c.PersistentCache] [node-2] persistent cache index loaded
[2021-04-02T10:06:28,844][INFO ][o.e.t.TransportService   ] [node-2] publish_address {192.168.23.91:9300}, bound_addresses {[::]:9300}
[2021-04-02T10:06:28,981][INFO ][o.e.b.BootstrapChecks    ] [node-2] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-04-02T10:06:38,995][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.i$
[2021-04-02T10:06:48,997][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.i$
[2021-04-02T10:06:58,999][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.i$
[2021-04-02T10:06:59,003][WARN ][o.e.n.Node               ] [node-2] timed out while waiting for initial discovery state - timeout: 30s
[2021-04-02T10:06:59,027][INFO ][o.e.h.AbstractHttpServerTransport] [node-2] publish_address {192.168.23.91:9200}, bound_addresses {[::]:9200}
[2021-04-02T10:06:59,031][INFO ][o.e.n.Node               ] [node-2] started
[2021-04-02T10:07:09,001][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.i$

and it seems like node-2 failed to join other nodes:

[2021-04-02T10:19:26,309][INFO ][o.e.c.c.JoinHelper       ] [node-2] failed to join {node-1}{eOdelnSTSsela0Y9qp9uQQ}{1AeSF_NoS66QJBwoHnDB2Q}{192.168.23.90}{192.168.23.90:9300}{cdfhilmrstw}$
org.elasticsearch.transport.RemoteTransportException: [node-1][192.168.23.90:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [node-2][192.168.23.91:9300] connect_exception

DavidTurner · April 2, 2021, 9:25am

Good, a different error is progress

Now you need to work out why it can't join the other nodes. You've truncated the log messages you shared, but the full messages should help you.

runchranda · April 5, 2021, 6:03am

Hi, I can't seem to figure out why it won't join other nodes. Below are the highlight of log messages:

[root@node-2 ~]# tail -n 50 /var/log/elasticsearch/central-log-umt.log
[2021-04-05T13:49:41,565][INFO ][o.e.c.c.JoinHelper       ] [node-2] failed to join {node-1}{eOdelnSTSsela0Y9qp9uQQ}{33KNrK03RJe6gUtn6HBS1w}{192.168.23.90}{192.168.23.90:9300}{cdfhilmrstw}{ml.machine_memory=33679245312, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=17179869184, transform.node=true} with JoinRequest{sourceNode={node-2}{QUO7Mfr-SYukq1QdXDgvHA}{2kj6De9RQUuVJWaEejTcow}{192.168.23.91}{192.168.23.91:9300}{cdfhilmrstw}{ml.machine_memory=33680547840, xpack.installed=true, transform.node=true, ml.max_open_jobs=20, ml.max_jvm_size=1073741824}, minimumTerm=155, optionalJoin=Optional.empty}
org.elasticsearch.transport.RemoteTransportException: [node-1][192.168.23.90:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [node-2][192.168.23.91:9300] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:968) ~[elasticsearch-7.12.0.jar:7.12.0]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:202) ~[elasticsearch-7.12.0.jar:7.12.0]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:31) ~[elasticsearch-core-7.12.0.jar:7.12.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:46) ~[elasticsearch-core-7.12.0.jar:7.12.0]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:57) ~[transport-netty4-client-7.12.0.jar:7.12.0]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.io.IOException: No route to host: 192.168.23.91/192.168.23.91:9300
Caused by: java.io.IOException: No route to host
        at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
        at sun.nio.ch.Net.pollConnectNow(Net.java:660) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) ~[?:?]
[root@node-2 ~]#

I know it says no route to host, but I have tested the connection using Netcat(nc) and the ouput says it is connected successfully:

[root@node-2 ~]# nc -zv 192.168.23.91 9300
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.23.91:9300.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
[root@node-2 ~]# nc -zv 192.168.23.91 9200
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.23.91:9200.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
[root@node-2 ~]#

Also now when I try to make a curl request from the new node, I got this error:

[root@node-2 ~]# curl -k --user admin -H 'Content-Type: application/json' -XGET https://node-2:9200/_cluster/health?pretty
Enter host password for user 'admin':
{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}
[root@node-2 ~]#

Do you have any idea how I can fix this, please?

DavidTurner · April 5, 2021, 7:37am

No, but I can confirm that this is a connectivity problem, nothing to do with Elasticsearch itself.

The prompt indicates that you're running this command on node-2, whereas node-1 is the node that is reporting the No route to host error. I suggest you try the same thing on node-1.

system · May 3, 2021, 7:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Node does not join existing cluster (ES 7.2) Elasticsearch	3	5362	August 9, 2019
Need Urgent Hekp: new node not joining to existed cluster Elasticsearch	6	438	July 6, 2017
Add 2nd node to cluster Elasticsearch	3	450	July 5, 2020
Nodes are not joining cluster Elasticsearch	3	1659	September 23, 2019
New node is not getting added to the cluster Elasticsearch	10	2021	November 28, 2018

New Elasticsearch node does not sync with existing cluster

Related topics