Continuing the discussion from github issue.
The current strategy, which seq# will keep enforcing but in easier/faster way, is that all replicas are "reset" to be an exact copy of the primary currently chosen by the master.
The situation will be resolved when the network restores and the left over primary and replica sync I would like to clearly understand the reset/sync scenarios. What triggers reset/sync?
I can think of a couple of "normal" operation scenarios
- I would think that whenever a node joins a network, the master would initiate a sync/reset.
- If a replica fails for a request, I suppose the primary should keep attempting a sync/reset, otherwise the replica might keep diverging, and at some point the master has to decommission that replica, otherwise the reads would be inconsistent.
In the case of split brain, with multi-network replicas (assuming min master nodes is set), primary-1 has been assuming that this replica R (on this third node, say N-3) has been failing (because of its allegiance to primary-2 ) but still is in the network. Hence it would attempt sync/reset. How does this protocol work? Should master-1 attempt to decommission R at some point, going by assumption (2)?
This problem will occur in a loop if R is decommissioned but another replica is installed on N-3 in its place, by the same protocol. There will be contention on N-3 for "reset"-ing replica shards by both the masters.
I suppose one way to resolve this is by letting a node choose a master if there are multiple masters. If we did this, then whenever a node loses its master, it would choose the other master, and there will be a sync/reset and all is well.
However if the node chooses its master, the other master will lose quorum, and hence cease to exist, which is a good resolution for this issue in my opinion.