Master goes down, even after re-election, cluster is unresponsive

I have a 5 node cluster with 3 master eligible nodes and 2 dedicated data nodes.

In the current cluster state, the current master node left owing to long GC's.

After re-election, master was assigned to some other master eligible node. I get the following exception in my Elasticsearch logs on one of the dedicated slave node.

java.lang.IllegalStateException: cluster state from a different master than the current one, rejecting (received {cls-es-slave1}{Bh9_gR2jRqiLn3IjNfHYpA}{}{}{master=true}, current {cls-es-master}{Z1O52E-fRIu2itHXL3l1Xg}{}{}{master=true})

Can anyone explain me whats going on?

Thanks in advance.

What version?

@warkolm Elasticsearch version: 2.4.5

Do you have minimum masters set?

No, But I guess when a new master is elected, the cluster should be healthy again automatically.

If I had minimum master nodes set to 2 and I have 3 master eligible nodes out of which one is the current master. Now if the current master goes down and a new one is elected. The cluuster should be automatically up and healthy again.

Am I missing something here?

If you don't have min masters then it's possible that you had a split brain.

Even if I had set the minimum masters to 2, I would have faced this situation right?

As a cluster state was being published from a different master than the current one as know to the dedicated data node.

Well it looks like you have multiple masters sending out conflicting updates, whereas if you had min masters set then only 1 master would ever be active and there wouldn't be the conflict.

But if I had the min master set, then the cluster would have been inoperable as the cluster would have waited for those many masters to join. How is this fault tolerant?

If you have 3 masters then min masters is 2, so you can still lose a master and maintain availability.
If you don't set it then you risk data loss and corruption.

It's a balance for sure, but I'd prefer consistency over availability myself, cause what's the point of having access to the data if it's wrong?

You mean the above exception might still have occurred even if I had min masters set?

No. It's saying there that there are multiple masters, so setting min masters would have prevented that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.