TransportClient + MasterNotDiscoveredException + Split Brain


(Ronak Shah) #1

Hi,
I am using ES 2.2 with Java client 2.2.

I have 3 node ES cluster. 2 Master + Data. 1 dedicated Data node.
I have set the discovery.zen.minimum_master_nodes: 2 to avoid split brain.

In my transport-client I have given all 3 nodes IP to connect.
In a good case scenarios I am getting an expected behavior.

If I create a network disconnect between 2 masters, I see that my cluster size is reduced to 2 from 3.
One node is isolated constantly trying to connect to the cluster.

But my transportclient fails giving me MasterNotDiscoveredException.
Note that my write was on before I created a disconnect.

I was assuming that Transportclient should only use nodes which are connected in the cluster.
Here, it should rebalance itself to use only 2.

But it seems that it is blindly doing round robin to all the nodes and the one which is isolated is throwing this
exception.

Isnt that incorrect? What am I missing?


(David Pilato) #2

The client tries any host he is aware of.
May be with sniff option, after some seconds, the transport might be able to stop sending requests to the failing node. But I did not check the code.


(Jörg Prante) #3

After node failure, for short dead time, TransportClient continues "blind" round-robin to all nodes connected. It's the sniff mechanism that disconnects from unresponsive nodes in the background, after a delay. Since split-brain nodes are still responsive, TransportClient happily continues. You have to remove the node from TransportClient node list for yourself, see method removeTransportAddress()


(Ronak Shah) #4

Thanks Jorg,
If I remove, than I have to keep polling for it to add it back when it re-joins.
That kinda suck.

Shouldnt transportclient internally do this?


(system) #5