Handling TCP disconnect

Hi all.

We have had an issue at a customer where nodes are dropping out of the
cluster.

Ive managed to recreate a similar scenario in our test-environment, by
using tcpkill, killing the tcp-traffic between a node and the master node.
In this setup, there are 3 nodes.

When using tcpkil to kill TCP-connection between MASTER and NODE2, we get
this behaviour:

[tcpkill]


MASTER NODE:


----- Discovery log -----

Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] [node ]
[[loadtest-appserver2][R5jh4N7OTmykaFRHUx70AA][inet[/192.168.4.2:9300]]{local=false}]
transport disconnected (with verified connect)

----- Network log -----

Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] close connection exception caught on
transport layer [[id: 0xeb1569e7, /192.168.4.2:50232 => /192.168.4.1:9300]],
disconnecting from relevant node
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0xeb1569e7, /
192.168.4.2:50232 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] close connection exception caught on
transport layer [[id: 0x90606514, /192.168.4.1:33969 => /192.168.4.2:9300]],
disconnecting from relevant node
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0x4ca1672c, /
192.168.4.2:50223 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0xa6bd9ab0, /
192.168.4.2:50226 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0xe9c0cc49, /
192.168.4.2:50225 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0x246a84b0, /
192.168.4.2:50230 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0x9d07abe3, /
192.168.4.2:50231 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0x82aea2d8, /
192.168.4.2:50229 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0x3ea2f9d0, /
192.168.4.2:50228 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0x922efe9c, /
192.168.4.2:50227 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] channel closed: [id: 0x5ccc53ad, /
192.168.4.2:50224 => /192.168.4.1:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver1] disconnected from
[[loadtest-appserver2][R5jh4N7OTmykaFRHUx70AA][inet[/192.168.4.2:9300]]{local=false}],
channel closed event


NODE 2


----- Discovery log -----

FINE: [loadtest-appserver2] [master] stopping fault detection against
master [[loadtest-appserver1][8xIvXqkSQuCzAt4lyTpJRQ][inet[/192.168.4.1:9300]]{local=false}],
reason [master failure, transport disconnected (with verified connect)]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalInfo
INFO: [loadtest-appserver2] master_left
[[loadtest-appserver1][8xIvXqkSQuCzAt4lyTpJRQ][inet[/192.168.4.1:9300]]{local=false}],
reason [transport disconnected (with verified connect)]

----- Network log -----

Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver2] disconnected from
[[loadtest-appserver1][8xIvXqkSQuCzAt4lyTpJRQ][inet[/192.168.4.1:9300]]{local=false}],
channel closed event


NODE 3


----- Discovery log -----

Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver3] [master] restarting fault detection against
master [[loadtest-appserver2][R5jh4N7OTmykaFRHUx70AA][inet[/192.168.4.2:9300]]{local=false}],
reason [new cluster state received and we are monitoring the wrong master
[[loadtest-appserver1][8xIvXqkSQuCzAt4lyTpJRQ][inet[/192.168.4.1:930
0]]{local=false}]]

----- Network log -----

FINEST: [loadtest-appserver3] channel opened: [id: 0x659ca590, /
192.168.4.3:45096 => /192.168.4.3:9300]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver3] disconnected from
[[loadtest-appserver2][R5jh4N7OTmykaFRHUx70AA][inet[/192.168.4.2:9300
]]{local=false}]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver3] connected to node
[[loadtest-appserver2][R5jh4N7OTmykaFRHUx70AA][inet[/192.168.4.2:9300
]]{local=false}]
Sep 24, 2013 10:32:28 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver3] disconnected from
[[loadtest-appserver1][8xIvXqkSQuCzAt4lyTpJRQ][inet[/192.168.4.1:9300
]]{local=false}]

[/tcpkill]

When just killing the network interface of NODE2, this is the behaviour:

[ifdown interface]


MASTER NODE:


----- Discovery log -----

FINEST: [loadtest-appserver1] [node ] failed to ping
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300]]{local=false}],
retry [1] out of [3]
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[loadtest-appserver2][inet[/192.168.4.2:9300]][discovery/zen/fd/ping]
request_id [230] timed out after [30000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:351)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Sep 24, 2013 10:46:54 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] [node ] failed to ping
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300]]{local=false}],
retry [2] out of [3]
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[loadtest-appserver2][inet[/192.168.4.2:9300]][discovery/zen/fd/ping]
request_id [267] timed out after [30000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:351)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Sep 24, 2013 10:47:24 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] [node ] failed to ping
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300]]{local=false}],
retry [3] out of [3]
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[loadtest-appserver2][inet[/192.168.4.2:9300]][discovery/zen/fd/ping]
request_id [300] timed out after [30000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:351)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Sep 24, 2013 10:47:24 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver1] [node ] failed to ping
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300]]{local=false}],
tried [3] times, each with maximum [30s] timeout
Sep 24, 2013 10:48:54 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver1] failed to send cluster state to
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300]]{local=false}],
should be detected as failed soon...
org.elasticsearch.transport.NodeDisconnectedException:
[loadtest-appserver2][inet[/192.168.4.2:9300]][discovery/zen/publish]
disconnected

Sep 24, 2013 10:48:54 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver1] failed to send cluster state to
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300]]{local=false}],
should be detected as failed soon...
org.elasticsearch.transport.NodeDisconnectedException:
[loadtest-appserver2][inet[/192.168.4.2:9300]][discovery/zen/publish]
disconnected

Sep 24, 2013 10:48:54 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver1] failed to send cluster state to
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300]]{local=false}],
should be detected as failed soon...
org.elasticsearch.transport.NodeDisconnectedException:
[loadtest-appserver2][inet[/192.168.4.2:9300]][discovery/zen/publish]
disconnected

----- Network log -----

Sep 24, 2013 10:48:54 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver1] disconnected from
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300
]]{local=false}]
Sep 24, 2013 10:48:54 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalTrace
FINEST: [loadtest-appserver1] close connection exception caught on
transport layer [[id: 0x1a51ae87, /192.168.4.1:34050 :> /192.168.4.2:9300]],
disconnecting from relevant node
java.nio.channels.ClosedChannelException
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:406)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:374)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:58)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
at
org.elasticsearch.common.netty.channel.Channels.close(Channels.java:812)
at
org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
at
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:902)
at
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:891)
at
org.elasticsearch.transport.netty.NettyTransport.disconnectFromNode(NettyTransport.java:759)
at
org.elasticsearch.transport.TransportService.disconnectFromNode(TransportService.java:137)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask$1.run(InternalClusterService.java:406)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)


NODE 2


----- Discovery log -----

FINE: [loadtest-appserver2] [master] failed to ping
[[loadtest-appserver1][vtpBrETrTESCcia7XHM_qA][inet[/192.168.4.1:9300]]{local=false}],
tried [3] times, each with maximum [30s] timeout
Sep 24, 2013 10:47:24 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver2] [master] stopping fault detection against
master [[loadtest-appserver1][vtpBrETrTESCcia7XHM_qA][inet[/192.168.4.1:9300]]{local=false}],
reason [master failure, failed to ping, tried [3] times, each with maximum
[30s] timeout]
Sep 24, 2013 10:47:24 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalInfo
INFO: [loadtest-appserver2] master_left
[[loadtest-appserver1][vtpBrETrTESCcia7XHM_qA][inet[/192.168.4.1:9300]]{local=false}],
reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
Sep 24, 2013 10:47:24 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver2] [master] restarting fault detection against
master [[loadtest-appserver3][0ljeiF-RRZy65dhxjJ22DQ][inet[/192.168.4.3:9300]]{local=false}],
reason [possible elected master since master left (reason = failed to ping,
tried [3] times, each with maximum [30s] timeout)]
Sep 24, 2013 10:48:55 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver2] [master] failed to ping
[[loadtest-appserver3][0ljeiF-RRZy65dhxjJ22DQ][inet[/192.168.4.3:9300]]{local=false}],
tried [3] times, each with maximum [30s] timeout
Sep 24, 2013 10:48:55 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver2] [master] stopping fault detection against
master [[loadtest-appserver3][0ljeiF-RRZy65dhxjJ22DQ][inet[/192.168.4.3:9300]]{local=false}],
reason [master failure, failed to ping, tried [3] times, each with maximum
[30s] timeout]
Sep 24, 2013 10:48:55 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalInfo
INFO: [loadtest-appserver2] master_left
[[loadtest-appserver3][0ljeiF-RRZy65dhxjJ22DQ][inet[/192.168.4.3:9300]]{local=false}],
reason [failed to ping, tried [3] times, each with maximum [30s] timeout]

----- Network log -----

Sep 24, 2013 10:48:55 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver2] disconnected from
[[loadtest-appserver3][0ljeiF-RRZy65dhxjJ22DQ][inet[/192.168.4.3:9300
]]{local=false}]


NODE 3


----- Discovery log -----

Nothing relevant

----- Network log -----

Sep 24, 2013 10:48:54 AM
org.elasticsearch.common.logging.slf4j.Slf4jESLogger internalDebug
FINE: [loadtest-appserver3] disconnected from
[[loadtest-appserver2][WzI53XwyQF21NrbrCy76CQ][inet[/192.168.4.2:9300
]]{local=false}]

[/if down interface]

Scenario 2, with killing the network interface is well handled, trying a
number of times to reconnect, with a set timeout. Scenario 1 on the other
hand, is not very graceful. The master node is immediatly disbanded as
master and a new master is selected (this can probably be handled by the
minimum_master_nodes settings, but this is not acceptable in our scenario).

Thus, a "normal" network disconnect is fine, the killing of the tcp-channel
is causing a lot of trouble. Is there any way to handle this more
gracefully, and is this indented behavior?
While its prioritized to discover the reason for the tcp channel disconnect
at our customer, it seems like the consequences of this tcpkill is a
critical weakness in the solution.

regards

Runar Myklebust
Enonic AS
Senior Developer

http://enonic.com/download

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For the current situation, using minimum_master_nodes is the preferred and
easiest method to prevent network splits in an ES cluster.

ES discovery was designed to cope with node failures. Network failures are
another topic, because the visibility of the cluster members become
different for every node and in some situations, the nodes must work hard
to agree how they can form a conflict-free cluster.

So what could be done to improve the situation from ground up? The keyword
you can look for is byzantine fault tolerance (BFT). BFT could not only
handle faults but also malicious nodes and is a very complex beast in
practice. In the future, I hope there will be an integration of
byzantine-fault-tolerance discovery modules, like bft-smart
https://code.google.com/p/bft-smart/

In the meantime, you can experiment with the Zookeeper plugin to replace
the zen discovery. Zookeeper works with checksums that can catch
non-malicious byzantine faults, including network failures.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.