TransportClient + MasterNotDiscoveredException + Split Brain

ronakmshah · March 8, 2016, 1:51am

Hi,
I am using ES 2.2 with Java client 2.2.

I have 3 node ES cluster. 2 Master + Data. 1 dedicated Data node.
I have set the discovery.zen.minimum_master_nodes: 2 to avoid split brain.

In my transport-client I have given all 3 nodes IP to connect.
In a good case scenarios I am getting an expected behavior.

If I create a network disconnect between 2 masters, I see that my cluster size is reduced to 2 from 3.
One node is isolated constantly trying to connect to the cluster.

But my transportclient fails giving me MasterNotDiscoveredException.
Note that my write was on before I created a disconnect.

I was assuming that Transportclient should only use nodes which are connected in the cluster.
Here, it should rebalance itself to use only 2.

But it seems that it is blindly doing round robin to all the nodes and the one which is isolated is throwing this
exception.

Isnt that incorrect? What am I missing?

dadoonet · March 8, 2016, 5:56am

The client tries any host he is aware of.
May be with sniff option, after some seconds, the transport might be able to stop sending requests to the failing node. But I did not check the code.

jprante · March 8, 2016, 8:22am

After node failure, for short dead time, TransportClient continues "blind" round-robin to all nodes connected. It's the sniff mechanism that disconnects from unresponsive nodes in the background, after a delay. Since split-brain nodes are still responsive, TransportClient happily continues. You have to remove the node from TransportClient node list for yourself, see method removeTransportAddress()

ronakmshah · March 8, 2016, 8:49pm

Thanks Jorg,
If I remove, than I have to keep polling for it to add it back when it re-joins.
That kinda suck.

Shouldnt transportclient internally do this?

Topic		Replies	Views
Java client usage in the presence of split-brain Elasticsearch	2	407	July 6, 2017
Transport Client unable to resolve hostname in certain cases Elasticsearch	12	3681	July 6, 2017
Client node rejoins but isn't communicating with the server Elasticsearch	9	497	July 6, 2017
Failed to start the client Elasticsearch	5	401	July 6, 2017
Getting MasterNotDiscoveredException: null in client node Elasticsearch	6	2164	March 9, 2021

TransportClient + MasterNotDiscoveredException + Split Brain

Related topics