Unable to do Cross Cluster Searching in AWS with Instances in Separate Regions

Hello,

I have two clusters installed on EC2 instances in separate regions in AWS. I am trying to use the cross cluster search feature in order to query both clusters from one central location. When I try to connect to the cluster 2 from the cluster 1, it will not connect. I looked at the network connections on the cluster 2 and it shows that a connection was established but then it disconnects a couple seconds later. I looked at the Elasticsearch logs on cluster 1 and it shows a connection time out error and that it was trying to connect to the private IP address of cluster 2 (the remote cluster is set up to use the public IP address). I know that they can see each other because I can ping one from the other and there is that brief connection. Is there something that I am missing that is causing cluster 1 from not being able to connect to cluster 2? I made sure that ports 9300 and 9200 on each cluster are open to each other.

Logs showing initial connection being made to cluster 2

  <ubuntu@ip-172-31-21-75:~$ netstat -no | grep 50.17.68.163
tcp6       0      0 172.31.21.75:9300       50.17.68.163:33370      ESTABLISHED keepalive (7198.12/0/0)
tcp6       0      0 172.31.21.75:9300       50.17.68.163:33366      ESTABLISHED keepalive (7198.12/0/0)
tcp6       0      0 172.31.21.75:9300       50.17.68.163:33364      ESTABLISHED keepalive (7198.12/0/0)
tcp6       0      0 172.31.21.75:9300       50.17.68.163:33368      ESTABLISHED keepalive (7198.12/0/0)
tcp6       0      0 172.31.21.75:9300       50.17.68.163:33372      ESTABLISHED keepalive (7198.12/0/0)
tcp6       0      0 172.31.21.75:9300       50.17.68.163:33374      ESTABLISHED keepalive (7198.12/0/0)/>

Logs showing timeout errors on cluster 1

[2020-10-20T14:58:39,067][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.seeds] from [[]] to [["52.9.148.235:9300"]]
[2020-10-20T14:58:39,068][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.mode] from [SNIFF] to [sniff]
[2020-10-20T14:58:39,069][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.seeds] from [[]] to [["52.9.148.235:9300"]]
[2020-10-20T14:58:39,069][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.mode] from [SNIFF] to [sniff]
[2020-10-20T14:58:39,070][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.seeds] from [[]] to [["52.9.148.235:9300"]]
[2020-10-20T14:58:39,071][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.mode] from [SNIFF] to [sniff]
[2020-10-20T14:58:39,072][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.seeds] from [[]] to [["52.9.148.235:9300"]]
[2020-10-20T14:58:39,109][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.mode] from [SNIFF] to [sniff]
[2020-10-20T14:58:39,110][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.seeds] from [[]] to [["52.9.148.235:9300"]]
[2020-10-20T14:58:39,111][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.mode] from [SNIFF] to [sniff]
[2020-10-20T14:58:39,111][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.seeds] from [[]] to [["52.9.148.235:9300"]]
[2020-10-20T14:58:39,112][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.mode] from [SNIFF] to [sniff]
[2020-10-20T14:58:39,112][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.seeds] from [[]] to [["52.9.148.235:9300"]]
[2020-10-20T14:58:39,114][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.mode] from [SNIFF] to [sniff]
[2020-10-20T14:58:39,114][INFO ][o.e.c.s.ClusterSettings  ] [ip-172-31-83-214] updating [cluster.remote.cluster_test.seeds] from [[]] to [["52.9.148.235:9300"]]
[2020-10-20T14:58:49,117][WARN ][o.e.t.RemoteClusterService] [ip-172-31-83-214] failed to connect to new remote cluster cluster_test within 10s
[2020-10-20T14:59:09,310][WARN ][o.e.t.SniffConnectionStrategy] [ip-172-31-83-214] fetching nodes from external cluster [cluster_test] failed
org.elasticsearch.transport.ConnectTransportException: [ip-172-31-21-75][172.31.21.75:9300] connect_timeout[30s]
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:972) ~[elasticsearch-7.9.2.jar:7.9.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678) ~[elasticsearch-7.9.2.jar:7.9.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]

Pinging is indeed helpful. Can you telnet though?

on cluster1: telnet <cluster2.IP> <cluster.public-port>
on cluster2: telnet <cluster1.IP> <cluster.public-port>

Just to rule out network problems - maybe it will only work one way, or one will drop while the other doesn't, etc.

EDIT: If you could post the configs of both clusters that would also be super useful. Feel free to edit out IP addresses but replace them with descriptive placeholders e.g. PUBLIC_IP, PRIVATE_IP.

Yes, I was able to use telnet to reach both clusters going both ways.

Here is the elasticsearch config of cluster 1:

# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: <PRIVATE_IP>
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: [PRIVATE_IP]
#To save
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

This is the Kibana config of cluster 1:

# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: <PRIVATE_IP>

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URLs of the Elasticsearch instances to use for all your queries.
elasticsearch.hosts: ["http://<PRIVATE_IP>:9200"]

This is the elasticsearch config of cluster 2:

# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: <PRIVATE_IP_CLUSTER2>
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: [PRIVATE_IP_CLUSTER2]
#To save
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

This is the Kibana config of cluster 2:

# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: <PRIVATE_IP_CLUSTER2>

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URLs of the Elasticsearch instances to use for all your queries.
elasticsearch.hosts: ["http://<PRIVATE_IP_CLUSTER2>:9200"]

Thanks!

I managed to figure it out by looking at the discussion here: Elasticsearch EC2 setup across multiple regions. The solution is the last comment in the post.

In the elasticsearch.yml file for cluster 1, I added the following lines to the file:
network.publish_host: Cluster 1's public IP address
discovery.zen.ping.unicast.hosts: ["Cluster 2's public IP address"]

In the elasticsearch.yml file for cluster 2, I added the following lines of code:
network.publish_host: Cluster 2's public IP address
discovery.zen.ping.unicast.hosts: ["Cluster 1's public IP address"]

I then restarted elasticsearch and the clusters were able to see each other and connect to each other as remote clusters. From there I was able to cross cluster search on both clusters.

This is a terrible idea: it involves exposing these unsecured clusters to the public internet. It's only a matter of time before someone deletes all your data, or worse.

Even with security enabled you should still not expose clusters to the internet. It's almost certainly a mistake to even have a public IP on the instances running Elasticsearch nodes.

Instead, you should set up VPC peering between your two regions so that the two clusters can see each other without also exposing them to the rest of the world.

1 Like

Hello David,

Thanks for pointing that out. I'm new to elasticsearch and thought I understood what the code was doing but thank you for pointing what it actually does and why I shouldn't do it.

Also thank you for suggesting a solution. I will try to implement this instead of what I found in that link.