Cross cluster search stops working after some time. ES: 6.1.1 Kibana: 6.0

dantete · January 5, 2018, 7:43pm

I am testing Cross cluster search.
I have 2 cluster with 2 nodes each one, but only works when i restart both clusters.
I can see the results from both cluster in Kibana and console (throught search API).
I can see the results of _remote/info API.

The problem is: After several minutes (between 10-20min) Cross cluster search don´t work anymore.

Kibana show:
Error: Request Timeout after 120000ms
at http://192.168.2.240:562/bundles/kibana.bundle.js?v=16070:13:4431
at http://192.168.2.240:562/bundles/kibana.bundle.js?v=16070:13:4852

The _remote/info API show this error after several minutes without response:
curl 10.98.129.90:9203/_remote/info?pretty
{
"error" : {
"root_cause" : [
{
"type" : "node_disconnected_exception",
"reason" : "[node-1][10.98.119.90:9300][cluster:monitor/nodes/info] disconnected"
}
],
"type" : "node_disconnected_exception",
"reason" : "[node-1][10.98.119.90:9300][cluster:monitor/nodes/info] disconnected"
},
"status" : 500
}

node-1 is always connected and responding ping.
There is not firewall between 2 clusters.
both clusters with firewalld service stopped
All traffic is permitted between 2 clusters.

¿somebody know what is wrong?

Regards!

Clusters and nodes INFO:-----------------------------------------------------------------------
Clusters:
my-cluster-a: 10.98.119.90
node-1(only data), JVM instance
node-2 (elegible master, data) JVM instance

my-cluster-b: 10.98.129.90
node-3(only data), JVM instance
node-4 (elegible master, data), JVM instance

my-cluster-a is added as remote server in my-cluster-b
Versions:
Logstash, Kibana: 6.0
Elasticsearch: 6.1.1

_remote/info my-cluster-a
curl 10.98.119.90:9201/_remote/info?pretty
{ }

_remote/info my-cluster-b (only once or twice after restart both cluster)
curl 10.98.129.90:9202/_remote/info?pretty
{
"my-cluster-b" : {
"seeds" : [
"10.98.129.90:9302",
"10.98.129.90:9303"
],
"http_addresses" : [
"10.98.129.90:9202",
"10.98.129.90:9203"
],
"connected" : true,
"num_nodes_connected" : 2,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
},
"my-cluster-a" : {
"seeds" : [
"10.98.119.90:9300",
"10.98.119.90:9301"
],
"http_addresses" : [
"10.98.119.90:9200",
"10.98.119.90:9201"
],
"connected" : true,
"num_nodes_connected" : 2,
"max_connections_per_cluster" : 3,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
}
}

Netstat in my-cluster-b when cross cluster search don´t works

netstat -nap | grep 10.98.119.90
tcp6 0 0 10.98.129.90:57326 tcp6 0 0 10.98.129.90:57323 tcp6 0 0 10.98.129.90:57324 tcp6 0 0 10.98.129.90:57304 tcp6 0 0 10.98.129.90:57305 tcp6 0 0 10.98.129.90:57306 tcp6 0 0 10.98.129.90:38929 tcp6 0 0 10.98.129.90:38930 tcp6 0 84 10.98.129.90:38900 tcp6 0 0 10.98.129.90:57307 tcp6 0 0 10.98.129.90:38903 tcp6 0 0 10.98.129.90:38926 tcp6 0 62 10.98.129.90:57308 tcp6 0 0 10.98.129.90:38902 tcp6 0 0 10.98.129.90:38927 tcp6 0 0 10.98.129.90:57322 tcp6 0 0 10.98.129.90:38904 tcp6 0 0 10.98.129.90:38928 tcp6 0 0 10.98.129.90:57325 tcp6 0 0 10.98.129.90:57303 tcp6 0 0 10.98.129.90:57327 tcp6 0 0 10.98.129.90:38899 tcp6 0 0 10.98.129.90:38901 tcp6 0 0 10.98.129.90:38925 10.98.119.90:9301 ESTABLISHED 60593/java
10.98.119.90:9301 ESTABLISHED 60593/java
10.98.119.90:9301 ESTABLISHED 60593/java
10.98.119.90:9301 ESTABLISHED 60433/java
10.98.119.90:9301 ESTABLISHED 60433/java
10.98.119.90:9301 ESTABLISHED 60433/java
10.98.119.90:9300 ESTABLISHED 60433/java
10.98.119.90:9300 ESTABLISHED 60433/java
10.98.119.90:9300 ESTABLISHED 60593/java
10.98.119.90:9301 ESTABLISHED 60433/java
10.98.119.90:9300 ESTABLISHED 60593/java
10.98.119.90:9300 ESTABLISHED 60433/java
10.98.119.90:9301 ESTABLISHED 60433/java
10.98.119.90:9300 ESTABLISHED 60593/java
10.98.119.90:9300 ESTABLISHED 60433/java
10.98.119.90:9301 ESTABLISHED 60593/java
10.98.119.90:9300 ESTABLISHED 60593/java
10.98.119.90:9300 ESTABLISHED 60433/java
10.98.119.90:9301 ESTABLISHED 60593/java
10.98.119.90:9301 ESTABLISHED 60433/java
10.98.119.90:9301 ESTABLISHED 60593/java
10.98.119.90:9300 ESTABLISHED 60593/java
10.98.119.90:9300 ESTABLISHED 60593/java
10.98.119.90:9300 ESTABLISHED 60433/java

rugenl · January 7, 2018, 3:56pm

I'm having a problem after upgrading to 6.1.1 too, I'm not sure if it's related to this, but can you check your elasticsearch log and see if you're getting nearly continuous gc messages?

My cluster seems OK until I run a search that worked OK in 5.x. The search ran about 30 seconds before and returned a lot of data, but it worked.

Thanks

dantete · January 8, 2018, 6:53pm

yeah, i am getting continuous gc messages:

"[2018-01-08T15:14:18,377][INFO ][o.e.m.j.JvmGcMonitorService] [node-3] [gc][259152] overhead, spent [306ms] collecting in the last [1s]"

When the cross cluster search stop working i get continuos:

"[2018-01-08T11:59:47,436][WARN ][o.e.t.n.Netty4Transport ] [node-3] exception caught on transport layer [org.elasticsearch.transport.netty4.NettyTcpChannel@1d70087], closing connection
java.io.IOException: Expiró el tiempo de conexión"

rugenl · January 8, 2018, 6:58pm

Sounds like a garbage problem in 6.1.1, it just depends on what your are doing as to what breaks.

dantete · January 10, 2018, 3:57pm

Based in this comment these logs are normal, moreover i am getting few GC logs per hour, for example: at 8am:
Node-3:-----------------------
[2018-01-10T08:04:41,292][INFO ][o.e.m.j.JvmGcMonitorService] [node-3] [gc][405989] overhead, spent [322ms] collecting in the last [1s]
[2018-01-10T08:14:10,063][INFO ][o.e.m.j.JvmGcMonitorService] [node-3] [gc][406557] overhead, spent [354ms] collecting in the last [1s]
[2018-01-10T08:34:07,748][INFO ][o.e.m.j.JvmGcMonitorService] [node-3] [gc][407753] overhead, spent [298ms] collecting in the last [1s]
[2018-01-10T08:44:06,484][INFO ][o.e.m.j.JvmGcMonitorService] [node-3] [gc][408351] overhead, spent [271ms] collecting in the last [1s]
[2018-01-10T08:54:05,269][INFO ][o.e.m.j.JvmGcMonitorService] [node-3] [gc][408949] overhead, spent [251ms] collecting in the last [1s]

Node-4:-----------------------
[2018-01-10T08:50:00,775][INFO ][o.e.m.j.JvmGcMonitorService] [node-4] [gc][409052] overhead, spent [252ms] collecting in the last [1s]
[2018-01-10T08:53:40,877][INFO ][o.e.m.j.JvmGcMonitorService] [node-4] [gc][409272] overhead, spent [288ms] collecting in the last [1s]

Questions:

do these logs evidence a bad configuration o cluster problem?
which relation have these logs and the cross cluster search queries?

rugenl · January 10, 2018, 4:15pm

OK, when I have the problem, I'm getting those every second. That's what I meant by "continuous".

system · February 7, 2018, 4:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cross cluster search stops working after some time Elasticsearch	2	1017	December 8, 2017
Random time out in cross-cluster seach Kibana (5.5) Kibana	2	583	October 11, 2017
Elasticsearch cross cluster does not work as expected Elasticsearch	2	388	June 12, 2018
Cross cluster search timing out ; reconnect automatically after random duration Elasticsearch	6	1351	October 29, 2019
Cross Cluster Search - Intermittant Failures Elasticsearch	17	1488	March 23, 2020

Cross cluster search stops working after some time. ES: 6.1.1 Kibana: 6.0

Related topics