Node cant rejoin to cluster (after reboot)

lecko · August 22, 2018, 8:35am

Hello,

I am using elasticsearch version 5.6.6 with 5 nodes on bare-metal servers. All was running fine for many months then I wanted to test how cluster behaves without one node.
So I stoppped node sn5 and also rebooted it. All the data was moved well to other 4 nodes. Also the cluster status is OK.
But when I started elasticsearch on node 5, it just can not rejoin cluster,I got these errors in log:

[2018-08-22T10:03:29,956][INFO ][o.e.b.BootstrapChecks ] [sn5] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-08-22T10:03:59,984][WARN ][o.e.n.Node ] [sn5] timed out while waiting for initial discovery state - timeout: 30s
[2018-08-22T10:03:59,997][INFO ][o.e.h.n.Netty4HttpServerTransport] [sn5] publish_address {x.x.x.208:9200}, bound_addresses {x.x.x.208:9200}
[2018-08-22T10:03:59,997][INFO ][o.e.n.Node ] sn5] started
[2018-08-22T10:04:00,324][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [sn5] no known master node, scheduling a retry
[2018-08-22T10:04:01,944][DEBUG][o.e.a.a.i.g.TransportGetIndexAction] [sn5] no known master node, scheduling a retry
[2018-08-22T10:04:03,320][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [sn5] no known master node, scheduling a retry

Also this status is returned:
curl -XGET 'http://x.x.x.208:9200/_cat/health?v&pretty'
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}

..
I found many similar cases on this forum. Most suggestions are to check telnet to ports 9200 and 9300 in both directions. But telnets work just fine in my environment.

The elasticsearch.yml config data is just the same as on other nodes, nothing special:

cluster.name: prod
bootstrap.memory_lock: true
discovery.zen.ping.unicast.hosts: ["x.x.x.204:9300","x.x.x.205:9300","x.x.x.206:9300", "x.x.x.207:9300", "x.x.x.208:9300"]
discovery.zen.minimum_master_nodes: 3
network.host: x.x.x.208

Another test that I did was to stop master node sn1, so that master changed to another server sn4 . But after restart of elasticsearch on original master sn1 it rejoined cluster just ok. I dont want to also reboot sn1 right now , becase this is one differetn thing that I didi with node sn5. First I would like to add sn5 to cluster.

Thanks for any ideas.

lecko · August 27, 2018, 12:23pm

Did 2 other tests.

I completely removed elasticsearch from node sn5 and reinstalled it.
The problem remained the same.
I freshly installed a new elasticsearch node N1, it is on the same network as other nodes and tried to join this new node with this "problematic" node sn5 into new cluster. It worked at once without problems.

So it seems that something in the elasticsearch configuration is preventing sn5 node to join.

system · September 24, 2018, 12:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Node failing to join the cluster after reboot Elasticsearch	9	2870	July 6, 2017
Master eligible node won't rejoin cluster after reboot Elasticsearch	6	1312	June 22, 2020
Cluster nodes doesn't reconnect Elasticsearch	4	1779	July 6, 2017
Master node can not rejoin cluster after restart on es2.0.0. but other nodes can rejoin cluster after restart #15916 Elasticsearch	5	995	July 5, 2017
Nodes leave and then rejoin the cluster randomly Elasticsearch	1	704	June 13, 2019

Node cant rejoin to cluster (after reboot)

Related topics