Hi All,
I've been doing some testing of a 3 node cluster with minimum_master_nodes
set to 2.
The summary outcome is:
Version 19.7 and below recover from multiple network disconnects (without
rebooting nodes) and honor the minimum_master_nodes setting, avoiding a
split brain. Version 19.8 recovers from the first network disconnect, but
fails to recover from the second, the disconnected node elects itself as
master and a split brain occurs.
Test setup:
3 nodes with the following config:
cluster.name: splitbrain
node.name: node[1,2,3]
discovery:
zen:
minimum_master_nodes: 2
Each node has a different node.name.
Test steps:
- Start node1, node2, and node3
- Create an index called "test1" default shards/replicas (5/1)
- Yank the network cable from node3
- After 30 seconds, node3 gets a ping failure
- Another 30 seconds, node3 gets another ping failure
- Another 30 seconds, node3 reports that there are not enough nodes "[WARN
][discovery.zen ] [node3] not enough master nodes after master
left (reason = transport disconnected (with verified connect)), current
nodes: {[node3][PKXJFl57R82PNhhD0p-n1Q][inet[/192.168.7.22:9300]],}" - Node 3 then goes into a loop pinging the other nodes
- Reconnect network cable
- Node 3 rejoins cleanly
With versions 19.7 and below, I can repeat steps 2-9 over and over. Each
run through, the disconnected node behaves cleanly, always prints the above
message and re-joins the cluster cleanly.
Note that I'm not rebooting/restarting any nodes before repeating steps 2-9.
With version 19.8, the first run through of steps 2-9 works as expected.
On the second run through, the disconnected node behaves differently:
On the second run through with v19.8, the second ping failure(step 5)
doesn't get logged. Instead, the following log message starts repeating:
[2012-07-24 11:55:00,270][WARN ][cluster.service ] [node3] failed
to reconnect to node
[node2][aPugNxMpTvCUDvjNfYKFjA][inet[/192.168.7.132:9301]]
org.elasticsearch.transport.ConnectTransportException:
[node2][inet[/192.168.7.132:9301]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:563)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:505)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:483)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:532)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:139)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:102)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:573)
at
org.elasticsearch.common.netty.channel.Channels.connect(Channels.java:642)
at
org.elasticsearch.common.netty.channel.AbstractChannel.connect(AbstractChannel.java:205)
at
org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:230)
at
org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:183)
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:550)
... 7 more
The node then elects itself as master, still reports itself as connected to
one of the other nodes. Upon reconnecting the network, we have a split
brain, where node3(master) thinks it's connected to node2. node1(master)
and node2 are connected and both don't know anything about node3 (nor do
they log anything about node3 upon re-connection).
It's a long shot, but could the netty upgrade in 19.8:
Have caused this error?
We could try to detect when a network error has occurred and reboot the
nodes, but this feels like a step back as pre- 19.8 we were resilient to
multiple network disconnects without rebooting.
Warm Regards,
Owen Butler