Master node not trusting node certificate

Hello,

I am writing this post as my previous post was closed by inactivity (here)

We are trying to create a cluster with one master node and one data node. So far, we had no success in that regard because we always get some errors we are not able to fix.

Some context:

  • We are using same elasticsearch version on both servers: 7.6.2
  • SSL, TLS configuration has been done with this article here
  • Master has never belonged to another cluster

Here is what our configuration elasticsearch.yml looks like on the master node:

cluster.name: goulue
node.name: master
node.master: true

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.host: goulue.icopartners.com
http.max_content_length: 100mb

discovery.seed_hosts: ["95.179.139.6", "127.0.0.1", "goulue.icopartners.com"]

cluster.initial_master_nodes: ["master"]

xpack.license.self_generated.type: "basic"

xpack.security.enabled: true

xpack.security.http.ssl.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.key: certificates/master.key
xpack.security.http.ssl.certificate: certificates/master.crt
xpack.security.http.ssl.certificate_authorities: certificates/ca.crt
xpack.security.transport.ssl.key: certificates/master.key
xpack.security.transport.ssl.certificate: certificates/master.crt
xpack.security.transport.ssl.certificate_authorities: certificates/ca.crt

Here is the same file for our data-node


cluster.name: goulue

node.name: ico-elastic-node-2
node.data: true
node.master: false

path.data: /var/lib/elasticsearch

path.logs: /var/log/elasticsearch

network.host: goulue-node.icopartners.com

discovery.seed_hosts: ["goulue.icopartners.com"]

cluster.initial_master_nodes: ["master"]

xpack.security.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.key: certificates/ico-elastic-node-2.key
xpack.security.http.ssl.certificate: certificates/ico-elastic-node-2.crt
xpack.security.http.ssl.certificate_authorities: certificates/ca.crt
xpack.security.transport.ssl.key: certificates/ico-elastic-node-2.key
xpack.security.transport.ssl.certificate: certificates/ico-elastic-node-2.crt
xpack.security.transport.ssl.certificate_authorities: certificates/ico-elastic-node-2.crt

When we restart the servers, we get this error:

failed to establish trust with server at [goulue.icopartners.com]; the server provided a certificate with subject name [CN=master] and fingerprint [7ee7d7501e635ec16480c0b99641f207c465cf6c]; the certificate has subject alternative names [DNS:goulue.icopartners.com]; the certificate is issued by [CN=Elastic Certificate Tool Autogenerated CA] but the server did not provide a copy of the issuing certificate in the certificate chain; this ssl context ([xpack.security.transport.ssl]) is not configured to trust that issuer

Here is our instance.yml file the previous article suggest to create:

instances:
  - name: 'master'
    dns: [ 'goulue.icopartners.com' ]
  - name: "ico-elastic-node-2"
    dns: [ 'goulue-node.icopartners.com' ]

We tried to enable the verification_mode: certificate on the master node but then we get this error:

[master] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/95.179.140.41:9300, remoteAddress=/95.179.154.158:56350}

And the cluster can't be formed. What are we doing wrong? How we can fix this?

Thank you so much.

Your data node has the following setting for trust

This is likely the problem. It is configured to trust its own certificate only. Based on your other configurations, you might want to change the value to certificates/ca.crt.

Also

Thank you for your response. However, changing that gives us a different error this time:

 last failed join attempt was 8.7s ago, failed to join {master}{lQo0td3QSru2lZXERkLXfQ}{jXPO_Tb-TvKL5-DXhXqX4g}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={ico-elastic-node-2}{d6bXnIITQg6uVlDfb8VxzQ}{QA-dhNV8S2W5hgnzRdEd4g}{goulue-node.icopartners.com}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional.empty}


 [ico-elastic-node-2] master not discovered yet: have discovered [{ico-elastic-node-2}{d6bXnIITQg6uVlDfb8VxzQ}{QA-dhNV8S2W5hgnzRdEd4g}{goulue-node.icopartners.com}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [95.179.140.41:9300] from hosts providers and [] from last-known cluster state; node term 147, last-accepted version 0 in term 0

This error you get on data node?

Yes, this error is on the data node.

Have you checked uuid of cluster on both nodes?

Yes, only master node has an uuid. Data node has na as value of the cluster_uuid.

That's interesting...
So master node was not found really.
Please try:

cd /usr/share/elasticsearch/bin
sudo ./elasticsearch-node detach-cluster
Do you want to proceed?

Confirm [y/N] y

and than:

sudo systemctl start elasticsearch

I dont know what that does, but the message we get on the data node if we do that sounds unsafe.

We can't afford to lose data on the master node running this.

 This tool can cause
arbitrary data loss and its use should be your last resort.

Hi again. Here is my full log file after we solved the certificate issue I posted previously

[2022-08-23T14:27:25,090][INFO ][o.e.t.TransportService   ] [ico-elastic-node-2] publish_address {95.179.154.158:9300}, bound_addresses {[::]:9300}
[2022-08-23T14:27:25,347][INFO ][o.e.b.BootstrapChecks    ] [ico-elastic-node-2] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2022-08-23T14:27:35,362][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ico-elastic-node-2] master not discovered yet: have discovered [{ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, {master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [95.179.140.41:9300] from hosts providers and [] from last-known cluster state; node term 156, last-accepted version 0 in term 0
[2022-08-23T14:27:45,365][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ico-elastic-node-2] master not discovered yet: have discovered [{ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, {master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [95.179.140.41:9300] from hosts providers and [] from last-known cluster state; node term 156, last-accepted version 0 in term 0
[2022-08-23T14:27:55,368][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ico-elastic-node-2] master not discovered yet: have discovered [{ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, {master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [95.179.140.41:9300] from hosts providers and [] from last-known cluster state; node term 156, last-accepted version 0 in term 0
[2022-08-23T14:27:55,371][WARN ][o.e.n.Node               ] [ico-elastic-node-2] timed out while waiting for initial discovery state - timeout: 30s
[2022-08-23T14:27:55,394][INFO ][o.e.h.AbstractHttpServerTransport] [ico-elastic-node-2] publish_address {95.179.154.158:9200}, bound_addresses {[::]:9200}
[2022-08-23T14:27:55,395][INFO ][o.e.n.Node               ] [ico-elastic-node-2] started
[2022-08-23T14:27:55,932][INFO ][o.e.c.c.JoinHelper       ] [ico-elastic-node-2] failed to join {master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=156, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, targetNode={master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][95.179.140.41:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [ico-elastic-node-2][95.179.154.158:9300] connect_timeout[30s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:995) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2022-08-23T14:27:55,941][INFO ][o.e.c.c.JoinHelper       ] [ico-elastic-node-2] failed to join {master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=156, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, targetNode={master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][95.179.140.41:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [ico-elastic-node-2][95.179.154.158:9300] connect_timeout[30s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:995) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2022-08-23T14:28:05,370][INFO ][o.e.c.c.JoinHelper       ] [ico-elastic-node-2] last failed join attempt was 9.4s ago, failed to join {master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=156, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, targetNode={master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][95.179.140.41:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [ico-elastic-node-2][95.179.154.158:9300] connect_timeout[30s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:995) ~[elasticsearch-7.6.2.jar:7.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.2.jar:7.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
[2022-08-23T14:28:05,373][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ico-elastic-node-2] master not discovered yet: have discovered [{ico-elastic-node-2}{RAOWxCxLRiugEl83rPZ3Mg}{jOUI2yapRL2QLm-x1bQUKA}{95.179.154.158}{95.179.154.158:9300}{dil}{ml.machine_memory=12558602240, xpack.installed=true, ml.max_open_jobs=20}, {master}{lQo0td3QSru2lZXERkLXfQ}{VIX8Mw8CQJ2h_Ag3K7S-sQ}{goulue.icopartners.com}{95.179.140.41:9300}{dilm}{ml.machine_memory=33548009472, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [95.179.140.41:9300] from hosts providers and [] from last-known cluster state; node term 156, last-accepted version 0 in term 0

Could it be that some kind of connectivity issue is happening between the master and the data?

What port do you use for http and transport?

I did not change that configuration so I guess it should be using the default 9200/9300

It should.
I prefer to set it even it's going to be default value.

Error says about timeout. How about firewall? Are ports open?

So I just checked and it seems that the port 9300 was not allowed on the firewall in the data-node. After unlocking that port, it seems the data-node has joined the master node!

1 Like

That's nice. Please double check cluster uuid on both nodes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.