New Elasticsearch node does not sync with existing cluster

Hi, I previously had 3 elasticsearch nodes consisting of node 1, node 2 and node 3, but my node 2 encountered some OS related issues so it had to be formatted.

So I reinstalled and reconfigured node 2 as before it is formatted but somehow node 2 is not joining the existing cluster and formed a new elasticsearch cluster instead. All my elasticsearch nodes are using the same version which is 7.12.0.

When I print out list of nodes from the primary/master node using curl, its only showing node 1 and node 3:

[root@node-1 ~]# curl -k --user elastic -X GET "https://node-1:9200/_cat/nodes?pretty"
Enter host password for user 'elastic':
192.168.23.90 44 99 4 0.00 0.03 0.02 cdfhilmrstw * node-1
192.168.23.92 52 98 1 0.00 0.03 0.05 cdfhilmrstw - node-3
[root@node-1 ~]#

I am able to view cluster status for node 1:

[root@node-1 ~]# curl -k --user elastic -H 'Content-Type: application/json' -XGET https://node-1:9200/_cluster/health?pretty
Enter host password for user 'elastic':
{
  "cluster_name" : "umtcentral-log",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 914,
  "active_shards" : 1820,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

and also node 3:

[root@node-1 ~]# curl -k --user elastic -H 'Content-Type: application/json' -XGET https://node-3:9200/_cluster/health?pretty
    Enter host password for user 'elastic':
    {
      "cluster_name" : "umtcentral-log",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 2,
      "number_of_data_nodes" : 2,
      "active_primary_shards" : 914,
      "active_shards" : 1820,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }
    [root@node-1 ~]#

when i check cluster status for node 2 it says "no route to host" instead:

[root@node-1 ~]# curl -k --user elastic -H 'Content-Type: application/json' -XGET https://node-2:9200/_cluster/health?pretty
Enter host password for user 'elastic':
curl: (7) Failed connect to node-2:9200; No route to host

but what confused me is that, when i check cluster status directly from node 2 and not from the existing primary node, i am able to get a reply, not a "no route to host":

[root@node-2 ~]# curl -k --user elastic -H 'Content-Type: application/json' -XGET https://node-2:9200/_cluster/health?pretty
Enter host password for user 'elastic':
{
  "cluster_name" : "umtcentral-log",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 5,
  "active_shards" : 5,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
[root@node-2 ~]#

Below are snippets of my elasticsearch.yml file for node 1:

# ---------------------------------- Cluster -----------------------------------
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: umtcentral-log

# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: node-1
node.master: true

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: "0.0.0.0"
network.bind_host: "0.0.0.0"
network.publish_host: "0.0.0.0"

#
# Set a custom port for HTTP:
#
#http.port: 9200
http.host: "0.0.0.0"

# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.seed_hosts: ["node-1", "node-2", node-3"]
discovery.zen.minimum_master_nodes: 2

#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

elasticsearch.yml for node 2:

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: umtcentral-log
node.data: true

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-2
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true

#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: "0.0.0.0"
network.bind_host: "0.0.0.0"
network.publish_host: "0.0.0.0"

#
# Set a custom port for HTTP:
#
#http.port: 9200
http.host: "0.0.0.0"

# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.seed_hosts: ["node-1", "node-2", "node-3"]
discovery.zen.minimum_master_nodes: 2

#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

lastly elasticsearch.yml for node 3:

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: umtcentral-log
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-3
node.name: node-3
node.data: true

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true

# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: "0.0.0.0"
network.bind_host: "0.0.0.0"
network.publish_host: "0.0.0.0"

#
# Set a custom port for HTTP:
#
#http.port: 9200
http.host: "0.0.0.0"

# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.seed_hosts: ["node-1", "node-2", "node-3"]
discovery.zen.minimum_master_nodes: 2

#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

I have had these issues for days and tried to troubleshoot by referring to the official documentations, forums and stuffs but its still a dead end for me.

What are the steps that I missed out to make sure all my elasticsearch nodes are in sync and be in the same cluster? I could really use some help here, please. Thanks in advanced.

The most likely explanation is that you auto-bootstrapped it. As per the docs at the bottom of that page, you will need to wipe it and start again with a corrected config.

You should also delete the following options from your config files:

Hi, thanks for the reply. By this, what exactly that is needed to be wiped out and on which nodes? I'm sorry, I don't quite understand here.

Also, does removing all the parameters above could prevent Elasticsearch from auto-bootstrapping? and do i remove it only in node 2 or all nodes?

Thanks in advanced.

discovery.seed_hosts will prevent auto-bootstrapping, you must have not had this the first time you ran the new node.

The data path on the new node.

I have commented out these paramaters and wiped the data on the new nodes but when I start Elasticsearch I got these errors:

[2021-04-02T10:06:28,724][INFO ][o.e.n.Node               ] [node-2] initialized
[2021-04-02T10:06:28,725][INFO ][o.e.n.Node               ] [node-2] starting ...
[2021-04-02T10:06:28,737][INFO ][o.e.x.s.c.PersistentCache] [node-2] persistent cache index loaded
[2021-04-02T10:06:28,844][INFO ][o.e.t.TransportService   ] [node-2] publish_address {192.168.23.91:9300}, bound_addresses {[::]:9300}
[2021-04-02T10:06:28,981][INFO ][o.e.b.BootstrapChecks    ] [node-2] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-04-02T10:06:38,995][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.i$
[2021-04-02T10:06:48,997][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.i$
[2021-04-02T10:06:58,999][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.i$
[2021-04-02T10:06:59,003][WARN ][o.e.n.Node               ] [node-2] timed out while waiting for initial discovery state - timeout: 30s
[2021-04-02T10:06:59,027][INFO ][o.e.h.AbstractHttpServerTransport] [node-2] publish_address {192.168.23.91:9200}, bound_addresses {[::]:9200}
[2021-04-02T10:06:59,031][INFO ][o.e.n.Node               ] [node-2] started
[2021-04-02T10:07:09,001][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.i$

and it seems like node-2 failed to join other nodes:

[2021-04-02T10:19:26,309][INFO ][o.e.c.c.JoinHelper       ] [node-2] failed to join {node-1}{eOdelnSTSsela0Y9qp9uQQ}{1AeSF_NoS66QJBwoHnDB2Q}{192.168.23.90}{192.168.23.90:9300}{cdfhilmrstw}$
org.elasticsearch.transport.RemoteTransportException: [node-1][192.168.23.90:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [node-2][192.168.23.91:9300] connect_exception

Good, a different error is progress :slight_smile:

Now you need to work out why it can't join the other nodes. You've truncated the log messages you shared, but the full messages should help you.

1 Like

Hi, I can't seem to figure out why it won't join other nodes. Below are the highlight of log messages:

[root@node-2 ~]# tail -n 50 /var/log/elasticsearch/central-log-umt.log
[2021-04-05T13:49:41,565][INFO ][o.e.c.c.JoinHelper       ] [node-2] failed to join {node-1}{eOdelnSTSsela0Y9qp9uQQ}{33KNrK03RJe6gUtn6HBS1w}{192.168.23.90}{192.168.23.90:9300}{cdfhilmrstw}{ml.machine_memory=33679245312, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=17179869184, transform.node=true} with JoinRequest{sourceNode={node-2}{QUO7Mfr-SYukq1QdXDgvHA}{2kj6De9RQUuVJWaEejTcow}{192.168.23.91}{192.168.23.91:9300}{cdfhilmrstw}{ml.machine_memory=33680547840, xpack.installed=true, transform.node=true, ml.max_open_jobs=20, ml.max_jvm_size=1073741824}, minimumTerm=155, optionalJoin=Optional.empty}
org.elasticsearch.transport.RemoteTransportException: [node-1][192.168.23.90:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [node-2][192.168.23.91:9300] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:968) ~[elasticsearch-7.12.0.jar:7.12.0]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:202) ~[elasticsearch-7.12.0.jar:7.12.0]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:31) ~[elasticsearch-core-7.12.0.jar:7.12.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:46) ~[elasticsearch-core-7.12.0.jar:7.12.0]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:57) ~[transport-netty4-client-7.12.0.jar:7.12.0]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.io.IOException: No route to host: 192.168.23.91/192.168.23.91:9300
Caused by: java.io.IOException: No route to host
        at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
        at sun.nio.ch.Net.pollConnectNow(Net.java:660) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) ~[?:?]
[root@node-2 ~]#

I know it says no route to host, but I have tested the connection using Netcat(nc) and the ouput says it is connected successfully:

[root@node-2 ~]# nc -zv 192.168.23.91 9300
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.23.91:9300.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
[root@node-2 ~]# nc -zv 192.168.23.91 9200
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.23.91:9200.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
[root@node-2 ~]#

Also now when I try to make a curl request from the new node, I got this error:

[root@node-2 ~]# curl -k --user admin -H 'Content-Type: application/json' -XGET https://node-2:9200/_cluster/health?pretty
Enter host password for user 'admin':
{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}
[root@node-2 ~]#

Do you have any idea how I can fix this, please?

No, but I can confirm that this is a connectivity problem, nothing to do with Elasticsearch itself.

The prompt indicates that you're running this command on node-2, whereas node-1 is the node that is reporting the No route to host error. I suggest you try the same thing on node-1.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.