TWO Clusters unable to communicate for remote search


(srirama rayaprolu) #1

In my local development servers, we created two clusters, each having single node.
The cluster status for each of them made green by making number of replicas to 0.

Below is the cluster configurations.

{
  "persistent": {
    "search": {
      "remote": {
        "new-cluster": {
          "seeds": [
            "172.25.24.160:9300"
          ]
        },
        "nc3": {
          "seeds": [
            "172.25.25.237:9300"
          ]
        }
      }
    }
  },
  "transient": {}
}

When checked the network level settings there is no issue observed.

When the send the search query using kibana tool, remote cluster search is working normally.

However when search request sent via the REST API it is saying unable to communicate with remote cluster as below.

org.elasticsearch.transport.TransportException: unable to communicate with remote cluster [new-cluster]
        at org.elasticsearch.action.search.RemoteClusterService$1.onFailure(RemoteClusterService.java:286) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:67) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.action.ActionListener.onFailure(ActionListener.java:101) ~[elasticsearch-5.4.0.jar:5.4.0]

Below is the logs at the elastic search startup time,

[2019-03-14T15:40:48,618][WARN ][o.e.a.s.RemoteClusterService] [node-3] failed to update seed list for cluster: new-cluster
org.elasticsearch.transport.ConnectTransportException: [node-1][10.0.30.48:9300] connect_timeout[30s]
        at org.elasticsearch.transport.netty4.Netty4Transport.connectToChannels(Netty4Transport.java:359) ~[?:?]
        at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:526) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:465) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:315) ~[elasticsearch-5.4.0.jar:5.4.0]

Here the 10.0.30.48:9300 is the internal IP address. ELK is running in the cloud-servers.
In the elasticsearch.yml file network.host is given as below.

network.host: 10.0.30.48

My elastic search version is 5.4.0 on both clusters.
Below is curl command output.

curl 172.25.24.160:9200
{
  "name" : "node-1",
  "cluster_name" : "new-cluster",
  "cluster_uuid" : "CWn8aXerToK2bc1O4VBrjw",
  "version" : {
    "number" : "5.4.0",
    "build_hash" : "780f8c4",
    "build_date" : "2017-04-28T17:43:27.229Z",
    "build_snapshot" : false,
    "lucene_version" : "6.5.0"
  },
  "tagline" : "You Know, for Search"
}
curl 172.25.25.237:9200
{
  "name" : "node-3",
  "cluster_name" : "nc3",
  "cluster_uuid" : "28hSUEv8T3aSS5MN16sZVg",
  "version" : {
    "number" : "5.4.0",
    "build_hash" : "780f8c4",
    "build_date" : "2017-04-28T17:43:27.229Z",
    "build_snapshot" : false,
    "lucene_version" : "6.5.0"
  },
  "tagline" : "You Know, for Search"
}

The search request URI is same in both the cases.


(Henning Andersen) #2

For this to work, I believe the two clusters must be able to reach each other on the ip address listed in network.host. Does it work if you use the public IP instead?


(srirama rayaprolu) #3

Thanks for reply.

Yes these ELK clusters are running behind the firewall/proxy. The IP address is getting translated to internal IP.
The CURL command on public IP is working and getting o/p as shown earlier. Here

    cluster name   Internal IP External IP    network.host in yml file
    nc3            10.0.21.92  172.25.25.237  10.0.21.92
    new-cluster    10.0.30.48  172.25.24.160  10.0.30.48 

Here curl command on public IP and output is shared.

If I kept the public IP in network.host then ELK is not coming up as it is not able to bind on that IP.


(Henning Andersen) #4

Hi @srirama,

is port 9300 open between the clusters? You should be able to test that by logging onto one of the nodes and doing

telnet ip 9300

against the other nodes ip. This should succeed, otherwise, they will not be able to communicate (and thus do cross cluster search).


(srirama rayaprolu) #5

Hi @HenningAndersen

Yes these nodes are reachable to each other on public ip on the ports 9200 and 9300. Verified the telnet.

image


(Henning Andersen) #6

Hi @srirama,

thanks for providing this. Can we repeat the telnet exercise against the internal ip addresses too?


(srirama rayaprolu) #7

Hi @HenningAndersen:

On the internal IP, they are not accessible to each other. Only using Public IP accessible.


(Henning Andersen) #8

HI @srirama,

OK. For a development setup, you could probably fix this by using the public IP address as the publish_host on each cluster, ie. set:

network.publish_host=172.25.25.237 on nc3
network.publish_host=172.25.24.160 on new-cluster


(srirama rayaprolu) #9

Hi @HenningAndersen:
Yes it is working now.

After changing the configuration I restarted the ELK at that time, which ever cluster node is restarted last it is binding to other. But at the same time the in other node remote info, seeds are becoming empty. Why is this so?

So I updated the cluster settings again in the node where seeds become empty, then the clusters are joined. Now both are up and joined.

Also if internal IPs are reachable to each other then what setting need to be changed. In the cluster definition need to provide internal IPs alone or any more settings need to be tuned.

Thanks for information.


(srirama rayaprolu) #10

Pl. check the https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html#advanced-network-settings for the explanation why the above setting is suggested.

Also in case of internal IPs are reachable to each other other than specifying network.host no other configuration changes are required.

@HenningAndersen pl. correct me in case of wrong.


(Henning Andersen) #11

HI @srirama,

yes, AFAIK that is correct. If the nodes can reach each other on the internal IPs, you should be able to remove the network.publish_host setting.

Figuring out why the remote seeds disappeared will take some more digging I think, preferably including reproducing this on your end with more precise description of steps. Do you find it important to get clarified?