ES starts, but Cross Cluster Search shows local data node disconnected

I have two lab ES 6.2.2 data nodes, and one data node also acts as a Cross Cluster Search node. When I start the CCS node for the first time with a script that starts up ES upon reboot, curl shows the node is connected, but when I dig a little deeper looking at the CCS connectivity, it shows the local node being disconnected. If I restart ES, everything works fine.

Here's the initial curl output following a reboot:

curl http://"<IPv6_address>":9200

{
"name" : "node2",
"cluster_name" : "cluster2",
"cluster_uuid" : "oaTc47YtQ_WFip5zoRvzag",
"version" : {
"number" : "6.2.2",
"build_hash" : "10b1edd",
"build_date" : "2018-02-16T19:01:30.685723Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

But when I look at the CCS info, here's what I see:

curl -XGET 'localhost:9200/_remote/info?pretty'

{
"cluster2" : {
"seeds" : [ ],
"http_addresses" : [ ],
"connected" : false,
"num_nodes_connected" : 0,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
},
"cluster1" : {
"seeds" : [
"[<IPv6_address>]:9300"
],
"http_addresses" : [
"[<IPv6_address>]:9200"
],
"connected" : true,
"num_nodes_connected" : 1,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
}
}

Here's my elasticsearch.yml file on the CCS node, with all commented out lines removed for brevity:

cluster.name: cluster2
node.name: node2
bootstrap.memory_lock: true
network.bind_host: 0.0.0.0
network.publish_host: eth0:ipv6
network.tcp.keep_alive: true
bootstrap.system_call_filter: false
search.remote.cluster1.seeds: "<node1_name>:9300"
search.remote.cluster2.seeds: "<node2_name>:9300"

Has anyone seen this behavior before, and if so, what did you do to solve it? Thanks in advance!

I looked this morning, and while both my nodes showed being connected, remote node1's CCS status showed a disconnection exception. The node_disconnected_exception took several minutes to show up after issuing the curl command. Here's what I'm seeing. These commands were all being issued on the CCS node2. Any ideas on how to fix this would be much appreciated. Please note that I don't have any data being indexed yet.

curl http://localhost:9200

{
"name" : "node2",
"cluster_name" : "cluster2",
"cluster_uuid" : "oaTc47YtQ_WFip5zoRvzag",
"version" : {
"number" : "6.2.2",
"build_hash" : "10b1edd",
"build_date" : "2018-02-16T19:01:30.685723Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

curl http://node1:9200

{
"name" : "node1",
"cluster_name" : "cluster1",
"cluster_uuid" : "IbNavfZmRLazxMjYsQ2d3w",
"version" : {
"number" : "6.2.2",
"build_hash" : "10b1edd",
"build_date" : "2018-02-16T19:01:30.685723Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

curl -XGET 'localhost:9200/_remote/info?pretty'

{
"error" : {
"root_cause" : [
{
"type" : "node_disconnected_exception",
"reason" : "[node1][[<IPv6_address>]:9300][cluster:monitor/nodes/info] disconnected"
}
],
"type" : "node_disconnected_exception",
"reason" : "[node1][[<IPv6_address>]:9300][cluster:monitor/nodes/info] disconnected"
},
"status" : 500
}

curl http://node1:9200

{
"name" : "node1",
"cluster_name" : "cluster1",
"cluster_uuid" : "IbNavfZmRLazxMjYsQ2d3w",
"version" : {
"number" : "6.2.2",
"build_hash" : "10b1edd",
"build_date" : "2018-02-16T19:01:30.685723Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

curl -XGET 'localhost:9200/_remote/info?pretty'

{
"cluster2" : {
"seeds" : [
"[<IPv6_address>]:9300"
],
"http_addresses" : [
"[<IPv6_address>]:9200"
],
"connected" : true,
"num_nodes_connected" : 1,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
},
"cluster1" : {
"seeds" : [
"[<IPv6_address>]:9300"
],
"http_addresses" : [
"[<IPv6_address>]:9200"
],
"connected" : true,
"num_nodes_connected" : 1,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
}
}

Looks like this was related to my JVM heap settings. They were set in the jvm.options file to the defaults of -Xms1g and -Xmx1g. My servers have 49GB memory, so I changed the settings to -Xms24g and -Xmx24g and rebooted. ES started, and CCS shows both clusters connected.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.