Adding Elasticsearch Node to a Cluster

Hi Team ,

I have a Elasticsearch cluster of 25 nodes , which is lets say belongs subnet-a and all nodes i have configured/added to cluster using ./elasticsearch-create-enrollment-token -s node and then reconfigure command , which internally re-arrange the http and transport certificates.

now i have to add few more nodes from subnet-b (the connectivity is there for public IPs from subnet-b servers to subnet-a servers and vise versa)

my 1st try
installed ES -8.12.2 version (same as existing 25 node cluster) and configured
discovery.seed_hosts: ["a.b.c.d:9300", "x.y.z.w:9300", .......... ] # all 25 machine public IPs with 9300 port

cluster.initial_master_nodes: ["a.b.c.d"] # provided at the time master server public IP

which was not working and getting this following error

[2024-04-03T11:08:54,144][WARN ][o.e.c.c.ClusterFormationFailureHelper] [servers1] master not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover master-eligible nodes ["a.b.c.d"] to bootstrap a cluster: have discovered [{servers1}{-ygZoPCGTo6iGlVlRXoKeA}{Qu1Q7lw8T0SplCfXNV4Qag}{servers1}{1.2.3.4}{1.2.3.4:9300}{cdfhilmrstw}{8.12.2}{7000099-8500010}]; discovery will continue using [ "a.b.c.d:9300", "x.y.z.w:9300" ] from hosts providers and [{servers1}{-ygZoPCGTo6iGlVlRXoKeA}{Qu1Q7lw8T0SplCfXNV4Qag}{servers1}{1.2.3.4}{1.2.3.4:9300}{cdfhilmrstw}{8.12.2}{7000099-8500010}] from last-known cluster state; node term 0, last-accepted version 0 in term 0; for troubleshooting guidance, see Troubleshooting discovery | Elasticsearch Guide [8.12] | Elastic

by seeing the error i though because of the transport and http certificates the new node is not able to communicate with existing cluster , hence

My 2nd try was
i copied transport.p12 and http certificate (which is generated by elasticsearch in 1st go) and pasted in new node /etc/elasticsearch/certs folder and restarted

this time i got the bellow error

[2024-04-03T12:13:04,321][ERROR][o.e.b.Elasticsearch ] [servers1] fatal exception while booting Elasticsearch
org.elasticsearch.ElasticsearchSecurityException: failed to load SSL configuration [xpack.security.transport.ssl] - cannot read configured [PKCS12] keystore [/etc/elasticsearch/certs/transport.p12] - this is usually caused by an incorrect password
.......................
Caused by: org.elasticsearch.common.ssl.SslConfigException: cannot read configured [PKCS12] keystore [/etc/elasticsearch/certs/transport.p12] - this is usually caused by an incorrect password
......................
Caused by: java.io.IOException: keystore password was incorrect
.......................
Caused by: java.security.UnrecoverableKeyException: failed to decrypt safe contents entry: javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.

i would like to know if there is any way to view/use the password for the transport keystore and trust store cert , if not the case the best way to add the node could realy help me

See these docs, particularly the bit that says "Do not use this setting when restarting a cluster or adding a new node to an existing cluster."

(may not be the root cause of your problem, but definitely a mistake)

Hi David ,

Thanks for your reply ,

Yes , initial master node setting is useful in 1st go only after that it doesnt hold any meaning , but forgot to mention i only included seed hosts setting in 1st try , after that in log i found that it is failing to connect/discover with master node , hence i thought of using initial master node setting to define master , but it was also failing to discover the same .

Now i am thinking to try bellow ,

i think the problem is with transport certificates

my conclusions :
when i setup cluster for 25 nodes i used http , transport SSL layer certs generated by Elasticsearch by default in 1st go (in /etc/elasticsearch/certs/*) ,
[reconfigure-node.html#_description_6]
-- this worked for all the nodes and added to my cluster just fine , also verified how this works , found that the enrollment token which we generate contains version , node IP , fingerprint , key as attached image

image

Now i am trying to add the node from different subnet , the same enrollment token will not work here as the IP in enrollment token is private IP (probably if i can create enrollment token which have public IP , may be i can use the enrollment token for other subnets as well - need to find a way to do this)

So i went for manual way of adding nodes to cluster , by adding public IP in discovery seeds setting ,

But now i think the problem is the existing transport security layer , because i am using different certs from existing cluster it will not allow me to connect to cluster , if i disable the transport security layer also same issue
(correct me if i am wrong here - its my assumption )

My next steps --

I will do a small POC , to create my own self signed certificates by CA authorities from elastic , so that i can have passwords and certs properly and apply the same generated certs to 2-3 nodes and try to add them to a cluster , if that works (i hope it does !! ) , i will replace the 25 node existing cluster certificates with the new certificates and further add the nodes using the same certificates

Please suggest if my understanding is in right way , or if i miss anything . or you could also suggest me the right way to replace the certificates to existing cluster without data loss..

thanks in advance

Each node needs to be visible to all other nodes at the same address. You can't have some nodes connecting via a private IP and others via a public IP. See these docs for more information.

That said, it's hard to offer much concrete help here. It sounds like you're struggling with discovery issues, although even this isn't totally clear. These docs describe how to troubleshoot discovery, including information on the log messages on which to focus. If you need help interpreting the logs please share them here.

1 Like

When enrolling a node into an existing cluster, the configuration process (which my be either reconfigure-node or elasticsearch --enrollment-token, depending on the type of package you are using) needs to be able to communicate over https to one of the addresses for the seed node (as found in adr field of the token).
If that's not possible then enrollment will fail.

The enrollment token takes the HTTP addresses from the node's bound addresses and published address as returned from GET _nodes/_local/http

In general we expect that a node's published HTTP address is accessible to all possible clients. It's not strictly an error if that's not the case, but things like client sniffing will fail if your nodes are publishing HTTP addresses that are only accessible on some subnets.
So, your first point of call might be to go and update the HTTP addresses of your nodes so that they're publishing addresses that can be accessed on all subnets.

Then enrollment might work. Or at least the first step will work - the part where the new node calls a REST API on the existing (seed) node in order to obtain the required certificates & configuration it needs to join the cluster.

Actually joining the cluster is the next part ...

You should not need to assume anything. If there is a connection issue between your nodes then the logs should be clear about what that is.
If the problem is that you have mismatched CAs then the error will state that there is a problem with trust (probably PKIX path building failed).
If the problem is with the IP addresses in the certs (which is unlikely because auto-configuration does not enable strict hostname checking) then you will see an error related to that.

Hi David ,

I understand your concern , confirm me one thing

For my existing setup if i replace private IPs with Public IPs , (assuming the connectivity is there to 9200 and 9300 from one to other via public IP ) will it affect cluster in any way

plan is to change network.host , instead of 0.0.0.0 , i will place public IP
image

and discovery.seed_hosts , instead of private IPs , will replace Public IPs

and restart one by one after changes

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.