Elasticsearch CCS: client get timeout when remote cluster is isolated by firewall

ES version: 6.2.4
os: redhat
Environment: We have 2 clusters (A and B) in 2 different data centers. For searching, we use "cross cluster search" to get data from one cluster(A), and then cluster A will search relevant data form B for us. And all is working fine.

Issue: Sometimes to test the availability of these 2 clusters, we will disable the network between cluster A and cluster B (shut down the network device or with firewall), we hope cluster A can detect the "disconnection" to cluster B, but sometimes cluster A can not detect this "disconnection" immediately (within 1 minutes).

Test: I tested CCS with 2 small clusters (use firewall/iptables to disconnect the network), I found out it seems related to the "network config":

With default TCP configuration(net.ipv4.tcp_keepalive_time=7200 net.ipv4.tcp_keepalive_intvl=75 net.ipv4.tcp_keepalive_probes=9), the cluster need more than 10 minutes to detect the "disconnection". The search will be blocked and the response is timeout, client will get exception bellow(exception 1). The most strange thing is that there isn't any "ERROR/WARN" log in ES log.

With updated TCP configuration(net.ipv4.tcp_keepalive_time=120 net.ipv4.tcp_keepalive_intvl=30 net.ipv4.tcp_keepalive_probes=2), the search response still is "timeout" at first but after 2~3 minutes, the search response is fine (only with data in cluster A, "_clusters":{"total":2,"successful":1,"skipped":1})

I am guessing there is no "heartbeat or ping" between different clusters when "CCS" is working. Not like "transport.ping_schedule" is used between transport connections.

BTW: I am using rest API to set the "CCS":
{
"persistent": {
"search": {
"remote": {
"clusterA": {
"skip_unavailable": "true",
"seeds": [
"X.X.X.X:9300"
]
},
"clusterB": {
"skip_unavailable": "true",
"seeds": [
"X2.X2.X2.X2:9300"
]
}
}
}
},
"transient": {}
}

exception 1:

java.lang.RuntimeException: java.io.IOException: listener timeout after waiting for [30000] ms
        at jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:397) ~[nashorn.jar:?]
        at jdk.nashorn.api.scripting.ScriptObjectMirror.callMember(ScriptObjectMirror.java:199) ~[nashorn.jar:?]
        at jdk.nashorn.api.scripting.NashornScriptEngine.invokeImpl(NashornScriptEngine.java:383) ~[nashorn.jar:?]
        at jdk.nashorn.api.scripting.NashornScriptEngine.invokeFunction(NashornScriptEngine.java:190) ~[nashorn.jar:?]
        at com.htsc.iscs.service.alarm.impl.RangeWatcherImpl.alarmCheck(RangeWatcherImpl.java:152) [service-2.0.0-SNAPSHOT.jar!/:2.0.0-SNAPSHOT]
        at com.htsc.iscs.watcher.WatcherManager.watch(WatcherManager.java:159) [classes!/:2.0.0-SNAPSHOT]
        at com.htsc.iscs.watcher.WatcherManager.access$100(WatcherManager.java:36) [classes!/:2.0.0-SNAPSHOT]
        at com.htsc.iscs.watcher.WatcherManager$1.run(WatcherManager.java:129) [classes!/:2.0.0-SNAPSHOT]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_101]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_101]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_101]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: java.io.IOException: listener timeout after waiting for [30000] ms
        at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:665) ~[elasticsearch-rest-client-6.2.4.jar!/:6.2.4]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:223) ~[elasticsearch-rest-client-6.2.4.jar!/:6.2.4]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:195) ~[elasticsearch-rest-client-6.2.4.jar!/:6.2.4]

@javanna

Thanks a lot for reporting this, your analysis is accurate, I opened https://github.com/elastic/elasticsearch/issues/34405 , where we are discussing some potential fixes for this problem. Stay tuned for a fix :wink:

One thing you could try is setting transport.ping_schedule to 5s in the elasticsearch.yaml of the CCS node(s). I would love to hear if that improves things.

I have added "transport.ping_schedule: 5s" in the elasticsearch.yaml in all data nodes which also configured with "search.remote.connect: true" (CCS nodes), and change back ipv4 network config to default. This config doesn't improve things.

And after 15 minutes, I see error log in ES:

[2018-10-12T09:40:05,502][WARN ][o.e.t.n.Netty4Transport  ] [esdata-26] exception caught on transport layer [NettyTcpChannel{localAddress=/168.61.45.26:57816, remoteAddress=168.61.45.91/168.61.45.91:9300}], closing connection
java.io.IOException: No route to host
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:?]
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:?]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:?]
	at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:?]
	at io.netty.buffer.PooledHeapByteBuf.setBytes(PooledHeapByteBuf.java:261) ~[netty-buffer-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) ~[netty-buffer-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) ~[netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

@javanna
Can I update "search.remote.connect" and "search.remote.node.attr" by API , or I need to change elasticsearch.yaml and restart service?

Those two are not dynamic settings, I think you can only set them in the elasticsearch.yaml, which requires restart.

Sadly, what you describe makes sense. The current transport ping mechanism acts as a keep-alive to not have connections drop which may happen in some situations. But it does not wait for a response to the ping nor does it support a timeout, which means that if the other side is down, this mechanism does not help with detecting the failure quicker than before.

Thanks for your feedback, we will need to improve the ping mechanism like I described in the issue that I linked above.

Cheers
Luca

If you are on Linux, could you also try setting /proc/sys/net/ipv4/tcp_retries2 to 6:

echo 6 | sudo tee /proc/sys/net/ipv4/tcp_retries2

This is in addition to having transport.ping_schedule: 5s. I wrote some more details on the Github ticket.

@javanna
Can Client node (not-master and not-data) work as CCS Node ?

hi @luxiaoxun note that there's also the ingest role these days for nodes. We moved away from the client node terminology which we now call "coordinating-only". Coordinating is exactly what a CCS node that holds no data would do (exactly the same as when a search request gets sent to it, the fact that it may go to other clusters makes no difference), so yes, a coordinating-only node can definitely work as a CCS node.

Cheers
Luca

Thanks for you reply.
You know, to solve the issue I mentioned before, I try to add client nodes in 2 clusters to work as CCS Nodes, so I can just only change the network config on client nodes, I don't want to change the network config on master and data nodes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.