Brand new master doesn't join cluster bootstrap error

faxm0dem · January 5, 2023, 5:07pm

Hi,

We have a 7.10.2 cluster with multiple data-only nodes and 3 master-only nodes.
We wiped one of the 3 masters clean (reinstall), and now it refuses to join the cluster:

master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{node0008.example.com}{8ITgJeCmTCOqah6b_kxMDQ}{aGmuukVWTx6PzBlxPp0oaA}{10.10.239.8}{10.10.239.8:9300}{m}]; discovery will continue using [127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, [2010:660:5009:84:10:10:239:8]:9300, 10.10.239.53:9300, [2010:660:5009:84:10:10:239:53]:9300, 10.10.234.11:9300, [2010:660:5009:304:10:10:234:11]:9300, 10.10.235.58:9300, [2010:660:5009:304:10:10:235:58]:9300, 10.10.239.135:9300, [2010:660:5009:84:10:10:239:153]:9300, 10.10.234.4:9300, [2010:660:5009:304:10:10:234:4]:9300, 10.10.234.23:9300, [2010:660:5009:304:10:10:234:23]:9300, 10.10.234.115:9300, [2010:660:5009:304:10:10:234:115]:9300, 10.10.234.114:9300, [2010:660:5009:304:10:10:234:114]:9300, 10.10.234.129:9300, [2010:660:5009:304:10:10:234:129]:9300] from hosts providers and [{node0008.example.com}{8ITgJeCmTCOqah6b_kxMDQ}{aGmuukVWTx6PzBlxPp0oaA}{10.10.239.8}{10.10.239.8:9300}{m}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

I tried with and without the cluster.initial_master_nodes setting (specifying all 3 master nodes).

DavidTurner · January 5, 2023, 5:22pm

Looks like a discovery problem, but 7.10.2 is very old (long past EOL) and newer versions have much better support for troubleshooting this kind of thing so I recommend you upgrade ASAP.

faxm0dem · January 5, 2023, 5:33pm

Yes, of course. But I'd rather not upgrade anything until I'm out of the water.
What do you suggest doing using 7.10.2 ?

DavidTurner · January 5, 2023, 5:55pm

I don't remember exactly how 7.10.2 behaves in this situation, but hopefully there's something in the logs to help. Also double-check your discovery config and inter-node connectivity.

faxm0dem · January 5, 2023, 6:26pm

I'm using discovery.seed_providers: file and the file unicast_hosts.txt contains a list of all data and master nodes

faxm0dem · January 5, 2023, 6:27pm

There is no other message in the node's logfile

warkolm · January 5, 2023, 8:06pm

Sharing your logs and config would be helpful, as otherwise we are just guessing.

faxm0dem · January 6, 2023, 8:23am

The only other log entries I get are:

Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
exception caught on transport layer [Netty4TcpChannel{localAddress=0.0.0.0/0.0.0.0:42374, remoteAddress=null}], closing connection

Here's our config:

action:
  destructive_requires_name: true
bootstrap:
  memory_lock: true
cluster:
  remote:
    connect: false
network:
  host:
  - 10.10.239.8
  - 127.0.0.1
path:
  repo:
  - /var/lib/elasticsearch-backup
discovery.seed_providers: file
cluster.name: foo
node.name: node0008.example.com
path.logs: /var/log/elasticsearch
path.data: /var/lib/elasticsearch
node.data: False
node.master: True
node.ingest: False
cluster.initial_master_nodes:
- node0008.example.com
- node0053.example.com
- node0311.example.com
network.publish_host: 10.10.239.8

The unicast_hosts.txt file:

node0008.example.com
node0053.example.com
node0311.example.com
node0614.example.com
node0135.example.com
node0304.example.com
node0323.example.com
node0415.example.com
node0414.example.com
node0429.example.com

faxm0dem · January 6, 2023, 8:25am

I just tried adding another master node (the fourth one), and that one joined the cluster with no issues. Is it possible the one that got reinstalled has got some stale config in the cluster ?

warkolm · January 6, 2023, 10:27pm

Please share your logs, not just an excerpt.

faxm0dem · January 9, 2023, 9:48am

Thanks for trying to help and requesting I share the full logs. While there is no other clue therein than the ones I already published, it did trigger me to pursue the root cause of the SSL Exceptions, which turned out to be the reason the node refused to join the cluster.

The root cause was unmatched X509 certificate and key.

Thanks again for your help

system · February 6, 2023, 9:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node Elasticsearch	15	14774	May 24, 2019
Boostrapping: master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster Elasticsearch	2	420	August 10, 2020
Cannot bootstrap a new cluster: master not discovered or elected yet Elasticsearch	4	1152	October 16, 2019
7.0 master not discovered yet Elasticsearch	3	9427	June 12, 2019
Elasticsearch shows "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster" Elasticsearch	1	622	December 15, 2021

Brand new master doesn't join cluster bootstrap error

Related topics