Cross Cluster Search gives strange results when remote nodes become unavailable


(Maxim) #1

Hi.

The problem
I've configured elasticsearch for Cross Cluster Search (CCS). It works well in most cases. But I can't get a correct behaviour in a situation when a remote node becomes unavailable [by pulling a network cable off]: CCS requests start hanging up.
The expected behaviour is to get some results from available CCS nodes.

The CCS simplified deployment scheme
node1/cluster1[1 shard]/index1
node2/cluster2[1 shard]/index1

The desired behaviour
node1 can do CCS requests to index1 of node1 and node2
node2 can do CCS requests to index1 of node1 and node2

Interesting behaviour particular qualities
Case 1

  1. node1, node2: ES are working
  2. node2: pull the network cable off
  3. node1: the CCS query to node2:index1 is hanging up

Case 2

  1. node2: ES is working, the node is not available (by the cable) for node1
  2. node1: ES is restated
  3. node1: make a CCS query to node2:index1 and get a result (w/o node2 data)
  4. node2: insert the network cable
  5. node1: make a CCS query to node2:index1 and get a result (w/ node2 data)
  6. node2: pull the network cable off
  7. node1: the CCS query to node2:index1 is hanging up

The used configuration
node1.elasticsearch.yml

cluster.name: cluster1
node.name: cluster1-node1
network.bind_host: ["127.0.0.1", "<Node1_IPv4>"]
network.publish_host: ["<Node1_IPv4>"]

node1.cluster.settings

{"persistent": { "search": { "remote": {
    "node1": {
      "skip_unavailable": "true",
      "seeds": ["127.0.0.1:9300"]
    },
    "node2": {
      "skip_unavailable": "true",
      "seeds": ["<Node2_IPv4>:9300"]
    }
}}}}

The node2 settings are similar.

Environment
Linux
ES 6.2.2
Java 9.X

Thank you in advance.


(Luca Cavanna) #2

Hi Maxim,
see Elasticsearch CCS: client get timeout when remote cluster is isolated by firewall for a similar recent discussion. Elasticsearch is currently not super quick detecting when remote clusters are down and that may make requests hang. We have been working on improving that and some changes in that direction will go out with the 6.6 version.

One thing you could do on your end is tweak OS settings like suggested by David in the linked discussion, see also https://github.com/elastic/elasticsearch/issues/34405#issuecomment-429465972 .

Cheers
Luca


(Maxim) #3

Ok.
Thank you, Luca.