Hello,
We are trying to replace our tribe nodes setup with cross-cluster search setup and we have multiple questions regarding the general configuration and monitoring of cross-cluster-search cluster.
Unfortunately the documentation does not reflect on many aspects of cross-cluster search inner workings.
Could you please answer following questions:
-
Am I understanding correctly that gateway nodes are being chosen ONLY at the cluster registration moment?
-
How are those gateway nodes chosen? Is there an algorithm for this?
-
Is chosen gateway nodes list propagated across all nodes of the cross-cluster-search cluster - or each node generates its own gateway nodes list separately?
-
Are the search requests coming through gateway nodes balanced somehow (Round-robin? Some other algorithm?) - or all search requests from a cross-cluster search node will end up on one gateway server - and other two act as a backup?
-
What happens if one or two of chosen gateway nodes become unresponsive? Should the cross-cluster-search node that has lost connections to those gateway nodes choose different gateway nodes eventually?
5a. If not - how can we make a node to re-select gateway nodes and re-establish the connections?
5b. If yes - how fast should it discover a loss of connectivity? Can this value be configured? -
Is there a way to monitor what specific gateway nodes have been chosen by a cross-cluster-search node? As far as I can understand - the "http_addresses" field has been removed from Elastic 7.X
-
Is there a difference between
"persistent" : { "cluster" : { "remote" : {
and
"persistent" : {
"search" : {
"remote" : {
locations for configuring cross-cluster-search settings?
As far as we can see - ELK 6.8.3 allows the same cross-cluster-search settings to be configured in both locations simultaneously.
Which setting will take precedence if we have contradicting cross-cluster-search settings in both locations?
Am I understanding correctly that
"persistent" : {
"search" : {
"remote" : {
should be used going forward?
8. Is there a way to completely remove a remote cluster from cross-cluster search configuration or to completely clear existing CCS configuration from a CCS cluster?
As far as we were able to see - there is currently no way in Elastic 6.8.3 to completely nullify neither
"persistent" : {
"cluster" : {
"remote" : {
nor
"persistent" : {
"search" : {
"remote" : {
We were able to only make seeds list empty for the clusters we do not need any more.
Could you please also let us understand the state our test cross-cluster-search cluster is in right now?
We have a cluster of 4 cross-cluster-search nodes on Elastic 6.8.3.
ip node.role name
10.48.37.18 mdi ams01-c02-kcs10
10.49.37.16 mdi zrh01-c02-kcs10
10.13.137.84 mdi sjc01-c02-kcs10
10.14.36.143 mdi iad01-c02-kcs10
We have following cluster settings in regards to one of the remote clusters:
{
"persistent" : {
"cluster" : {
"remote" : {
<...>
"aws71-c01" : {
"skip_unavailable" : "true",
"seeds" : [
"10.3.6.63:9301",
"10.3.6.66:9301",
"10.3.6.85:9301"
],
"transport" : {
"ping_schedule" : "30s"
}
},
<...>
But the outputs of _remote/info for different nodes in cluster show different "num_nodes_connected" values.
See for example for the node ams01-c02-kcs10:
curl -s 10.48.37.18:9200/_remote/info?pretty | grep -A 16 "aws71-c01"
"aws71-c01" : {
"seeds" : [
"10.3.6.66:9301",
"10.3.6.85:9301",
"10.3.6.63:9301"
],
"http_addresses" : [
"10.3.6.66:9201",
"10.3.6.63:9201",
"10.3.6.85:9201"
],
"connected" : true,
"num_nodes_connected" : 3,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : true
},
and for the node iad01-c02-kcs10:
curl -s 10.14.36.143:9200/_remote/info?pretty | grep -A 16 "aws71-c01"
"aws71-c01" : {
"seeds" : [
"10.3.6.66:9301",
"10.3.6.85:9301",
"10.3.6.63:9301"
],
"http_addresses" : [
"10.3.6.66:9201",
"10.3.6.85:9201",
"10.3.6.124:9201"
],
"connected" : true,
"num_nodes_connected" : 1,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : true
},
If we restart iad01-c02-kcs10 node - we can probably get the num_nodes_connected to 3 again, but we were not able to influence the number by any other means (including sending search requests).
Is this the only way of restoring the correct number of connections? Or we are incorrect using num_nodes_connected as a monitoring metric for the state of cross-cluster search?
What steps can be performed to investigate the issue further?