Elasticsearch remote clusters not connecting

Hello everyone,
Due to a recent interest in my workplace, I've been trying to connect 2 of my clusters for DR purposes, but no matter what I do, the 2 clusters will just not connect, at all.
At first, I tried connecting them by leaving the "transport.port" setting untouched, the clusters didn't connect, and the only piece of information I got was the little question mark next to the "Not connected" saying: "Ensure the seed nodes are configured with the remote cluster's transport port, not the http port"
So, I tried setting up the transport.port setting manually to 9300 at my second attempt for connecting, and a 9340 for my third attempt, both of these connection attempts prompted the same "not connected" and same message next to it as the first attempt.
For my last attempt, I tried setting the "transport.port" for a range of ports, and not a singular port, needless to say that didn't work as well.

At this point, I'm quite lost and have no clue where to go from here, and would really appreciate any kind of assistance.
For general information, I'm running both of these clusters on different datacenters located in different areas, on an on-premise network. Both clusters are running Elasticsearch 7.5.1, and from what it seems, our Firewalls aren't blocking the communication between them.

Thank you for reading , and I hope you have a great day.

Eyal

I think these are remote clusters @Christian_Dahlqvist, i.e. for CCR or CCS, so a connection should work.

The UI hint about the transport port doesn't sound relevant here. It's a common mistake, but if you are using the default port of 9300 everywhere then it's not that, and things will be much simpler if you stick to the default port.

There should be more details of the problem in the logs. Can you see any relevant-looking messages from around the time that you set the connections up?

The logs show me this, I don't think this is anything specific but maybe you can understand it better than me @DavidTurner

[2020-02-26T11:24:15,533][WARN ][o.e.t.RemoteClusterService] [cluster1es] failed to connect to new remote cluster cluster2 within 10s

[2020-02-26T11:24:35,536][WARN ][o.e.t.RemoteClusterConnection] [cluster1es] fetching nodes from external cluster [cluster2] failed

org.elasticsearch.transport.ConnectTransportException: [][XXX.XXX.XXX.XXX:9300] connect_timeout[30s]

        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:995) ~[elasticsearch-7.5.1.jar:7.5.1]

        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.5.1.jar:7.5.1]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

        at java.lang.Thread.run(Thread.java:830) [?:?]

[2020-02-26T11:24:35,538][WARN ][o.e.t.RemoteClusterService] [cluster1es] failed to update seed list for cluster: cluster2

org.elasticsearch.transport.ConnectTransportException: [][XXX.XXX.XXX.XXX:9300] connect_timeout[30s]

        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:995) ~[elasticsearch-7.5.1.jar:7.5.1]

        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[elasticsearch-7.5.1.jar:7.5.1]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

        at java.lang.Thread.run(Thread.java:830) [?:?]

Yes, that's useful indeed.

This node is trying to connect to XXX.XXX.XXX.XXX:9300 but that attempt is timing out, which indicates a connectivity issue. I expect you can reproduce this with curl http://XXX.XXX.XXX.XXX:9300/ from this node. This should immediately respond with This is not an HTTP port. If it doesn't then you need to investigate your network config.

I see, I'll check up on that and tag you in a reply.
Thank you very much.

1 Like

It appears that I am not getting that kind of response at all.
Curl-ing from cluster1 to cluster 2 gives me a "connection timed out"
From cluster2 to cluster1 I get "Empty reply from server"
just incase - I also checked in chrome and Firefox, In chrome I received the message "cluster2node didn't send any data", and in Firefox I just received "The connection was reset"
Very sorry to trouble you, but is it possible for you to direct me to some sort of documentation so I could try to fix this issue?
Either way, thank you very much

Just to check, are you using security (i.e. TLS) on the transport layer? If so, I should have told you to use curl -k https://XXX.XXX.XXX.XXX:9300/ to test the connection instead.

It's hard to give any guidance for next steps here - establishing connectivity between things is outside the scope of this forum, but maybe you have a local sysadmin or network admin person who can offer some help in this area?

Both my clusters are only configured with xpack.security.transport and not with http security.
And I'll ask about that, thank you!
If its not an issue, I'll tag you if I find some sort of fix/any kind of continuation.
Thank you very much!

Ok if you have xpack.security.transport.ssl.enabled: true then you will need to use curl -k https://XXX.XXX.XXX.XXX:9300/ (we need https here because we must open an encrypted connection to the transport port).

well since I haven't configured https security the response I get is "Peer reports incompatible or unsupported protocol version."
I'll continue checking around for a fix, but for now, thank you very much!

Well, the issue was the certificate.
I didn't know I had to make a set of certificates for both clusters together, seems like that solved it for me.
Either way, thank you very much for all your help

I don't think a bad certificate would cause a connection timeout, so maybe you fixed the connectivity problem and then ran into a second issue to do with certificates?

The documentation does say that you need to establish a trust relationship between the clusters either by using the same CA for both or by configuring each cluster's CA to be trusted by the other cluster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.