ES starts, but Cross Cluster Search shows local data node disconnected

mtowle · September 5, 2018, 7:43pm

I have two lab ES 6.2.2 data nodes, and one data node also acts as a Cross Cluster Search node. When I start the CCS node for the first time with a script that starts up ES upon reboot, curl shows the node is connected, but when I dig a little deeper looking at the CCS connectivity, it shows the local node being disconnected. If I restart ES, everything works fine.

Here's the initial curl output following a reboot:

curl http://"<IPv6_address>":9200

{
"name" : "node2",
"cluster_name" : "cluster2",
"cluster_uuid" : "oaTc47YtQ_WFip5zoRvzag",
"version" : {
"number" : "6.2.2",
"build_hash" : "10b1edd",
"build_date" : "2018-02-16T19:01:30.685723Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

But when I look at the CCS info, here's what I see:

curl -XGET 'localhost:9200/_remote/info?pretty'

{
"cluster2" : {
"seeds" : [ ],
"http_addresses" : [ ],
"connected" : false,
"num_nodes_connected" : 0,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
},
"cluster1" : {
"seeds" : [
"[<IPv6_address>]:9300"
],
"http_addresses" : [
"[<IPv6_address>]:9200"
],
"connected" : true,
"num_nodes_connected" : 1,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
}
}

Here's my elasticsearch.yml file on the CCS node, with all commented out lines removed for brevity:

cluster.name: cluster2
node.name: node2
bootstrap.memory_lock: true
network.bind_host: 0.0.0.0
network.publish_host: eth0:ipv6
network.tcp.keep_alive: true
bootstrap.system_call_filter: false
search.remote.cluster1.seeds: "<node1_name>:9300"
search.remote.cluster2.seeds: "<node2_name>:9300"

Has anyone seen this behavior before, and if so, what did you do to solve it? Thanks in advance!

mtowle · September 7, 2018, 12:17pm

I looked this morning, and while both my nodes showed being connected, remote node1's CCS status showed a disconnection exception. The node_disconnected_exception took several minutes to show up after issuing the curl command. Here's what I'm seeing. These commands were all being issued on the CCS node2. Any ideas on how to fix this would be much appreciated. Please note that I don't have any data being indexed yet.

curl http://localhost:9200

{
"name" : "node2",
"cluster_name" : "cluster2",
"cluster_uuid" : "oaTc47YtQ_WFip5zoRvzag",
"version" : {
"number" : "6.2.2",
"build_hash" : "10b1edd",
"build_date" : "2018-02-16T19:01:30.685723Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

curl http://node1:9200

{
"name" : "node1",
"cluster_name" : "cluster1",
"cluster_uuid" : "IbNavfZmRLazxMjYsQ2d3w",
"version" : {
"number" : "6.2.2",
"build_hash" : "10b1edd",
"build_date" : "2018-02-16T19:01:30.685723Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

curl -XGET 'localhost:9200/_remote/info?pretty'

{
"error" : {
"root_cause" : [
{
"type" : "node_disconnected_exception",
"reason" : "[node1][[<IPv6_address>]:9300][cluster:monitor/nodes/info] disconnected"
}
],
"type" : "node_disconnected_exception",
"reason" : "[node1][[<IPv6_address>]:9300][cluster:monitor/nodes/info] disconnected"
},
"status" : 500
}

curl http://node1:9200

{
"name" : "node1",
"cluster_name" : "cluster1",
"cluster_uuid" : "IbNavfZmRLazxMjYsQ2d3w",
"version" : {
"number" : "6.2.2",
"build_hash" : "10b1edd",
"build_date" : "2018-02-16T19:01:30.685723Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

curl -XGET 'localhost:9200/_remote/info?pretty'

{
"cluster2" : {
"seeds" : [
"[<IPv6_address>]:9300"
],
"http_addresses" : [
"[<IPv6_address>]:9200"
],
"connected" : true,
"num_nodes_connected" : 1,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
},
"cluster1" : {
"seeds" : [
"[<IPv6_address>]:9300"
],
"http_addresses" : [
"[<IPv6_address>]:9200"
],
"connected" : true,
"num_nodes_connected" : 1,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
}
}

mtowle · September 13, 2018, 3:37pm

Looks like this was related to my JVM heap settings. They were set in the jvm.options file to the defaults of -Xms1g and -Xmx1g. My servers have 49GB memory, so I changed the settings to -Xms24g and -Xmx24g and rebooted. ES started, and CCS shows both clusters connected.

system · October 11, 2018, 3:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
_remote/info?pretty cannot get the accurate/latest information Elasticsearch ccs-cross-cluster-search	11	1026	September 13, 2019
Elasticsearch cross cluster does not work as expected Elasticsearch	2	388	June 12, 2018
Cross Cluster Search gives strange results when remote nodes become unavailable Elasticsearch	3	617	January 4, 2019
CCS set-up Elasticsearch	1	338	September 1, 2020
Cross Cluster Search does not work after a restart Elasticsearch	1	696	October 10, 2018

ES starts, but Cross Cluster Search shows local data node disconnected

curl http://"<IPv6_address>":9200

curl -XGET 'localhost:9200/_remote/info?pretty'

curl http://localhost:9200

curl http://node1:9200

curl -XGET 'localhost:9200/_remote/info?pretty'

curl http://node1:9200

curl -XGET 'localhost:9200/_remote/info?pretty'

Related topics