Cross-cluster search questions

Hello,

We are trying to replace our tribe nodes setup with cross-cluster search setup and we have multiple questions regarding the general configuration and monitoring of cross-cluster-search cluster.
Unfortunately the documentation does not reflect on many aspects of cross-cluster search inner workings.

Could you please answer following questions:

  1. Am I understanding correctly that gateway nodes are being chosen ONLY at the cluster registration moment?

  2. How are those gateway nodes chosen? Is there an algorithm for this?

  3. Is chosen gateway nodes list propagated across all nodes of the cross-cluster-search cluster - or each node generates its own gateway nodes list separately?

  4. Are the search requests coming through gateway nodes balanced somehow (Round-robin? Some other algorithm?) - or all search requests from a cross-cluster search node will end up on one gateway server - and other two act as a backup?

  5. What happens if one or two of chosen gateway nodes become unresponsive? Should the cross-cluster-search node that has lost connections to those gateway nodes choose different gateway nodes eventually?
    5a. If not - how can we make a node to re-select gateway nodes and re-establish the connections?
    5b. If yes - how fast should it discover a loss of connectivity? Can this value be configured?

  6. Is there a way to monitor what specific gateway nodes have been chosen by a cross-cluster-search node? As far as I can understand - the "http_addresses" field has been removed from Elastic 7.X

  7. Is there a difference between

    "persistent" : {
     "cluster" : {
       "remote" : {
    

and

 "persistent" : {    
   "search" : {
      "remote" : {

locations for configuring cross-cluster-search settings?
As far as we can see - ELK 6.8.3 allows the same cross-cluster-search settings to be configured in both locations simultaneously.
Which setting will take precedence if we have contradicting cross-cluster-search settings in both locations?
Am I understanding correctly that

 "persistent" : {    
   "search" : {
      "remote" : {

should be used going forward?
8. Is there a way to completely remove a remote cluster from cross-cluster search configuration or to completely clear existing CCS configuration from a CCS cluster?
As far as we were able to see - there is currently no way in Elastic 6.8.3 to completely nullify neither

"persistent" : {
    "cluster" : {
      "remote" : {

nor

 "persistent" : {    
   "search" : {
      "remote" : {

We were able to only make seeds list empty for the clusters we do not need any more.

Could you please also let us understand the state our test cross-cluster-search cluster is in right now?

We have a cluster of 4 cross-cluster-search nodes on Elastic 6.8.3.

ip           node.role name
10.48.37.18  mdi       ams01-c02-kcs10
10.49.37.16  mdi       zrh01-c02-kcs10
10.13.137.84 mdi       sjc01-c02-kcs10
10.14.36.143 mdi       iad01-c02-kcs10

We have following cluster settings in regards to one of the remote clusters:

{
  "persistent" : {
    "cluster" : {
      "remote" : {
	  <...>
	    "aws71-c01" : {
          "skip_unavailable" : "true",
          "seeds" : [
            "10.3.6.63:9301",
            "10.3.6.66:9301",
            "10.3.6.85:9301"
          ],
          "transport" : {
            "ping_schedule" : "30s"
          }
        },
	<...>

But the outputs of _remote/info for different nodes in cluster show different "num_nodes_connected" values.

See for example for the node ams01-c02-kcs10:

curl -s 10.48.37.18:9200/_remote/info?pretty | grep -A 16 "aws71-c01"
  "aws71-c01" : {
    "seeds" : [
      "10.3.6.66:9301",
      "10.3.6.85:9301",
      "10.3.6.63:9301"
    ],
    "http_addresses" : [
      "10.3.6.66:9201",
      "10.3.6.63:9201",
      "10.3.6.85:9201"
    ],
    "connected" : true,
    "num_nodes_connected" : 3,
    "max_connections_per_cluster" : 3,
    "initial_connect_timeout" : "30s",
    "skip_unavailable" : true
  },

and for the node iad01-c02-kcs10:

curl -s 10.14.36.143:9200/_remote/info?pretty | grep -A 16 "aws71-c01"
  "aws71-c01" : {
    "seeds" : [
      "10.3.6.66:9301",
      "10.3.6.85:9301",
      "10.3.6.63:9301"
    ],
    "http_addresses" : [
      "10.3.6.66:9201",
      "10.3.6.85:9201",
      "10.3.6.124:9201"
    ],
    "connected" : true,
    "num_nodes_connected" : 1,
    "max_connections_per_cluster" : 3,
    "initial_connect_timeout" : "30s",
    "skip_unavailable" : true
  },

If we restart iad01-c02-kcs10 node - we can probably get the num_nodes_connected to 3 again, but we were not able to influence the number by any other means (including sending search requests).
Is this the only way of restoring the correct number of connections? Or we are incorrect using num_nodes_connected as a monitoring metric for the state of cross-cluster search?
What steps can be performed to investigate the issue further?

  1. Am I understanding correctly that gateway nodes are being chosen ONLY at the cluster registration moment?

No. Gateway nodes are determined during a connection attempt. The local cluster will connect to a configured seed node and request the cluster state. From the cluster state it will extract the remote nodes and initiate connections (up to the maximum connections_per_cluster). As the documentation indicates, dedicated master nodes are not eligible.

If you want some level of control over which nodes are chosen, you can configure the node attributes.

As described in the documentation:

A node attribute to filter out nodes that are eligible as a gateway node in the remote cluster. For instance a node can have a node attribute node.attr.gateway: true such that only nodes with this attribute will be connected to if cluster.remote.node.attr is set to gateway

In this particular case, you would set cluster.remote.node.attr: gateway. And then on the remote cluster you would set node.attr.gateway: true on the nodes that you want to be gateway nodes.

  1. How are those gateway nodes chosen? Is there an algorithm for this?

Elasticsearch iterates through the nodes in the remote cluster and makes connection attempts to the eligible nodes. It stops making connection attempts once the configured number of connections are open.

  1. Is chosen gateway nodes list propagated across all nodes of the cross-cluster-search cluster - or each node generates its own gateway nodes list separately?

Each node does the connection process on its own.

  1. Are the search requests coming through gateway nodes balanced somehow (Round-robin? Some other algorithm?) - or all search requests from a cross-cluster search node will end up on one gateway server - and other two act as a backup?

Round robin. Unless Elasticsearch decides that one of the gateway nodes holds the shard it is requesting data from in which case it directs the request to that node.

  1. What happens if one or two of chosen gateway nodes become unresponsive? Should the cross-cluster-search node that has lost connections to those gateway nodes choose different gateway nodes eventually?

If a connection closes, it runs the connection process again. Requests the cluster state and iterates through nodes (excluding nodes that it is already connected to) and makes connection attempts.

5b. If yes - how fast should it discover a loss of connectivity? Can this value be configured?

It discovers that the connection is dead when the connection dies due to the socket closing. TCP keep alive pings can be configured to ensure connections are kept active.

  1. Is there a way to monitor what specific gateway nodes have been chosen by a cross-cluster-search node? As far as I can understand - the "http_addresses" field has been removed from Elastic 7.X

No. This would be a feature request. Although we have been talking about adding more network level monitoring inside of Elasticsearch. Besides future enhancements, you will need to monitor this at an infrastructure level.

1 Like

Hi Alexey,
Tim has already answered all the questions around gateway selection and fault detection.

I would like to also point you to the remote clusters docs where some of this info is described:
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/modules-remote-clusters.html . Let us know what may be missing there which you would have found useful to read.

It is true that from the remote info api you cannot see which nodes are acting as gateway. I'd ask why you need to know, and if you need that, it sounds like you may want to have more control on which nodes are selected and that is possible like Tim mentioned, by configuring CCS with specific node attributes.

The cluster.remote category of settings have replaced the search.remote category as remote clusters are now also used by cross cluster replication, while they were initially only used for CCS. As with all the breaking changes we make, both settings are available for every minor release of a major version, so that the deprecated setting can be replaced with the new one, after which the replaced setting (search.remote) gets removed. I would recommend using cluster.remote.

If you want to remove a remote cluster, you just need to set its seeds to null. This is also described in the remote clusters docs that I linked above.

Looking at your setup, indeed the num connections should be three in the second case too. Are all nodes on the same version? It is odd that only one node gets selected. Could it otherwise be that 10.14.36.143 cannot reach all of the nodes in the remote cluster? Have you looked at the logs to see whether it's trying to connect?

Cheers
Luca

@Tim_Brooks, @javanna - thank you very much for your answers, the situation is much clearer now.

@javanna - we do not want to restrict the nodes in big clusters to a small subset of cluster nodes - but we still want to understand what specific nodes the queries are being routed through for ease of debugging (e.g. this particular node was under heavy load at the same moment a heavy query was passing through - and that's why the query has timed out, or we had packet loss between those nodes - and that's why search results were incomplete etc.)
Also - tagging specific nodes to be gateways require node restarts - which is somewhat problematic for existing heavy-loaded clusters in production

I have tried deleting cluster settings we currently have under search.remote (on Elastic 6.8.3) - and it looks like there is some kind of bug though.

So currently we have

 "persistent" : {
"cluster" : {
  "remote" : {
  -------some clusters-------
"search" : {
  "remote" : {
    "ams01" : {
      "seeds" : [
        "10.48.36.200:9301",
        "10.48.36.70:9301",
        "10.48.36.200:9303",
        "10.48.36.228:9301",
        "10.48.36.195:9301",
        "10.48.36.69:9301",
        "10.48.36.200:9302",
        "10.48.36.228:9302",
        "10.48.36.71:9301",
        "10.48.36.195:9302"
      ]

After running
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent": {"search": {"remote": {"ams01": {"seeds": null}}}}}'

I can see following line in log:
[2020-03-06T22:04:14,895][INFO ][o.e.c.s.ClusterSettings ] [sjc01-c02-kcs10]updating [cluster.remote.ams01.seeds] from [["10.48.36.200:9301","10.48.36.70:9301","10.48.36.200:9303","10.48.36.228:9301","10.48.36.195:9301","10.48.36.69:9301","10.48.36.200:9302","10.48.36.228:9302","10.48.36.71:9301","10.48.36.195:9302"]] to [[]]
and I can see

"persistent" : {
    "cluster" : {
      "remote" : {
        "ams01" : {
          "seeds" : [  ]
        },

and still can see:

 "search" : {
      "remote" : {
        "ams01" : {
          "seeds" : [
            "10.48.36.200:9301",
            "10.48.36.70:9301",
            "10.48.36.200:9303",
            "10.48.36.228:9301",
            "10.48.36.195:9301",
            "10.48.36.69:9301",
            "10.48.36.200:9302",
            "10.48.36.228:9302",
            "10.48.36.71:9301",
            "10.48.36.195:9302"
          ]

It looks like that 6.8.3 does not properly process the update of settings defined in 5.6 format (e.g. if you try to update the seeds info - it goes directly to cluster.remote settings too.)

_remote/info reflects the changes though...

Is there a way to force 6.8.3 to clear the search.remote settings? I would be OK with deleting all search.remote settings completely, as it looks like they are duplicated in cluster.remote now anyways.

Also - what happens if we have different definitions of remote cluster in cluster.remote and search.remote at the same time?

Hi,
cluster.remote has a fallback to search.remote. That means that if you define remote clusters with different names, they will be merged, hence all will be valid. But if you define the same remote cluster with both settings, as I think you\ do, the one defined in cluster.remote should take the precedence over the other.

I am not sure though what is going wrong with your calls, I am afraid it may get messy when you have the same cluster defined in both sides. Have you tried setting them both to null and starting from scratch?

When it comes to routing CCS requests, note that the node(s) that the requests is being routed to does not necessarily have all the data. Those can just be proxy nodes as they are the only ones that can receive connections from the outside, but the queries are executed where the data is, which is not so different from an ordinary search.

Also note that this has changed quite a bit in 7.0 with the introduction of the ability to minimize roundtrips by default. See https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cross-cluster-search.html#ccs-network-delays .

Have you tried setting them both to null and starting from scratch?

As far as I can see - I can delete a cluster from cluster.remote by setting its seeds and optional settings to null (as mentioned in Remote clusters | Elasticsearch Guide [6.8] | Elastic).
The cluster gets removed from cluster.remote - I do not see it in the output of _cluster.settings nor in _remote.info (so it looks like the fallback to search.remote does not happen in that particular case)

I would very much like to remove all clusters from search.remote and leave them only in cluster.remote - just to have a cleaner configuration though.
The problem is that it looks like that there is currently no way in 6.8.3 to delete a cluster from search.remote - as setting the "seeds" to null as recommended in Cross-cluster search | Elasticsearch Reference [5.6] | Elastic - does not actually delete the cluster (instead it adds it with an empty list of seeds to cluster.remote).
Adding all optional cluster.remote parameters to search.remote
(e.g. curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent": {"search": {"remote": {"ams01": {"seeds": null, "skip_unavailable" : null, "transport" : {"ping_schedule" : null}}}}}}' results in expected {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"persistent setting [search.remote.ams01.transport.ping_schedule], not recognized"}],"type":"illegal_argument_exception","reason":"persistent setting [search.remote.ams01.transport.ping_schedule], not recognized"},"status":400}

Is there a way to completely reset search.remote part of the cluster configuration?
Please note that it is the part of persistent cluster settings - so re-starting the cluster probably would not help. Or we have to completely re-create the cluster from scratch in that particular case?

Regarding the num_nodes_connected stat - could you please elaborate - how should those connections manifest?

Am I understanding correctly that at any time I have to be able to see 3 established TCP connections to 3 different IPs to port 9300 if I see num_nodes_connected = 3 on a CCS node?

Or those connections will come and go, getting dropped and re-established as needed - and we should not correlate num_nodes_connected to a number of established TCP connections at any given moment?

Basically - we are trying to understand how should we monitor our CCS cluster via Telegraf to ensure that all nodes are connected to all clusters with maximum redundancy.
Can we use num_nodes_connected metric for this - or we have to just rely on connected stat for a cluster - and should not try to monitor the number of connected nodes per cluster? Or we have to monitor the number of established TCP connections to port 9300?

I think that's right, search.remote.*.transport.ping_schedule isn't a setting, because the per-remote-cluster ping schedule was introduced in #34753 (i.e. 6.6.0), which was after the search.remote.* settings were deprecated in #33413 (i.e. 6.5.0). I don't think it should be possible to set search.remote.ams01.transport.ping_schedule. Are you sure that search.remote.ams01.transport.ping_schedule is set?

@DavidTurner - you are correct, it is not set - and even should not be set

The actual issue is that
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent": {"search": {"remote": {"ams01": {"seeds": null}}}}}'
does not delete the cluster from search.remote in 6.8.3 despite returning
{"acknowledged":true,"persistent":{"cluster":{"remote":{"ams01":{"seeds":[]}}}},"transient":{}}
It adds the ams01 cluster to cluster.remote with an empty seeds list if the ams01 cluster is not already present there.

It looks like it is impossible to delete those settings from search.remote under 6.8. Am I correct - or am I missing a way of clearing out search.remote?

As far as I can see - search.remote does not affect anything - but I would not call having a section of "immutable" configuration present forever in your cluster settings a good thing

This sounds like a possible bug to me, but if so we'll need some help to reproduce it before we can fix it. I can't see anything obviously wrong from just looking at the code. Can you share the full output of GET _cluster/settings? Can you tell us how you got into this state? It sounds like you've done (one or more) rolling upgrades from earlier versions?

Sure, but the outputs of _cluster.settings and _remote.info are more than 13k symbols long each - please let me know how would you prefer to have them shared with you.

I think the reproduction should be pretty easy though - create a CCS under 5.6.17 (with definitions in search.remote) and perform rolling update to 6.8.3 directly. You should get stuck in the same limbo as we are now :slight_smile:

We are currently on version 6.8.3
We have tested CCS on 5.6 - and have performed a rolling update to 6.8.3.

If I try to delete the cluster using
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'{"persistent": {"search": {"remote": {"ams01": {"seeds": null}}}}}'

I get {"acknowledged":true,"persistent":{"cluster":{"remote":{"ams01":{"seeds":[]}}}},"transient":{}} and following line in the log:
[2020-03-09T22:10:43,272][INFO ][o.e.c.s.ClusterSettings ] [sjc01-c02-kcs10]updating [cluster.remote.ams01.seeds] from [["10.48.36.200:9301","10.48.36.70:9301","10.48.36.200:9303","10.48.36.228:9301","10.48.36.195:9301","10.48.36.69:9301","10.48.36.200:9302","10.48.36.228:9302","10.48.36.71:9301","10.48.36.195:9302"]] to [[]]

Note that it is talking about updating cluster.remote in the response and in the log - but does not say anything about search.remote

Basically currently I am trying to clean up cluster settings to ensure we are using only cluster.remote - and after that we can return to checking the connection-related questions - although some of the questions are theoretical and probably can be answered even with the cluster in its current state

Regarding the num_nodes_connected stat - could you please elaborate - how should those connections manifest?

Am I understanding correctly that at any time I have to be able to see 3 established TCP connections to 3 different IPs to port 9300 if I see num_nodes_connected = 3 on a CCS node?

Or those connections will come and go, getting dropped and re-established as needed - and we should not correlate num_nodes_connected to a number of established TCP connections at any given moment?

Basically - we are trying to understand how should we monitor our CCS cluster via Telegraf to ensure that all nodes are connected to all clusters with maximum redundancy.
Can we use num_nodes_connected metric for this - or we have to just rely on connected stat for a cluster - and should not try to monitor the number of connected nodes per cluster? Or we have to monitor the number of established TCP connections to port 9300?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.