My understanding is that recovery from Split Brain situations is
troublesome and we are encourages to ensure that a cluster is only active
if there is a quorum of candidate masters , we do that by setting discovery.zen.minimum_master_nodes to ((cm/2) + 1), that is a true
majority of candidate masters.
I also see that many folks want to enable resilience in the event of
disaster by running a cluster across two data centres. Lose one data center
and we just keep running in the other - we still have half our servers, so
with correct over-provisioning we can cope with the workload. No need to
rebuild the cluster and restore from backup. We have what may be called
Live/Live Disaster Recovery (DR)
I see two issues with this approach to DR. If we lose half our candidate
master nodes, by definition we cannot achieve a quorum, we won't have a
true majority.
A slightly more subtle problem is that we probabaly also set gateway.recover_after_nodes to a high value (say 8 of 10) so that in
normal running with a few missing nodes we don't inadvertantly get into
shard shuffling. Once again with a loss of half our estate we will not have
enough nodes to satisfy that setting.
My conclusion is that while Live/Live operation across two Data Centers is
possible the transition to a reduced state requires reconfiguration, the
system will not "just keep running" unless we open ourselves to Split Brain
and replica shuffling.
It's not recommended to do cross DC clusters for these reasons (and more).
You're be better off having two clusters and then syncing data between them.
On 11 December 2014 at 17:44, David Artus djna01@gmail.com wrote:
My understanding is that recovery from Split Brain situations is
troublesome and we are encourages to ensure that a cluster is only active
if there is a quorum of candidate masters , we do that by setting discovery.zen.minimum_master_nodes to ((cm/2) + 1), that is a true
majority of candidate masters.
I also see that many folks want to enable resilience in the event of
disaster by running a cluster across two data centres. Lose one data center
and we just keep running in the other - we still have half our servers, so
with correct over-provisioning we can cope with the workload. No need to
rebuild the cluster and restore from backup. We have what may be called
Live/Live Disaster Recovery (DR)
I see two issues with this approach to DR. If we lose half our candidate
master nodes, by definition we cannot achieve a quorum, we won't have a
true majority.
A slightly more subtle problem is that we probabaly also set gateway.recover_after_nodes to a high value (say 8 of 10) so that in
normal running with a few missing nodes we don't inadvertantly get into
shard shuffling. Once again with a loss of half our estate we will not
have enough nodes to satisfy that setting.
My conclusion is that while Live/Live operation across two Data Centers is
possible the transition to a reduced state requires reconfiguration, the
system will not "just keep running" unless we open ourselves to Split Brain
and replica shuffling.
If I remember correctly, version 1.4 can turn nodes that cant connect to
the cluster to read only mode.
On Thursday, December 11, 2014 4:44:28 PM UTC, David Artus wrote:
My understanding is that recovery from Split Brain situations is
troublesome and we are encourages to ensure that a cluster is only active
if there is a quorum of candidate masters , we do that by setting discovery.zen.minimum_master_nodes to ((cm/2) + 1), that is a true
majority of candidate masters.
I also see that many folks want to enable resilience in the event of
disaster by running a cluster across two data centres. Lose one data center
and we just keep running in the other - we still have half our servers, so
with correct over-provisioning we can cope with the workload. No need to
rebuild the cluster and restore from backup. We have what may be called
Live/Live Disaster Recovery (DR)
I see two issues with this approach to DR. If we lose half our candidate
master nodes, by definition we cannot achieve a quorum, we won't have a
true majority.
A slightly more subtle problem is that we probabaly also set gateway.recover_after_nodes to a high value (say 8 of 10) so that in
normal running with a few missing nodes we don't inadvertantly get into
shard shuffling. Once again with a loss of half our estate we will not
have enough nodes to satisfy that setting.
My conclusion is that while Live/Live operation across two Data Centers is
possible the transition to a reduced state requires reconfiguration, the
system will not "just keep running" unless we open ourselves to Split Brain
and replica shuffling.
On 11 December 2014 at 20:00, Elvar Böðvarsson elvarb@gmail.com wrote:
If I remember correctly, version 1.4 can turn nodes that cant connect to
the cluster to read only mode.
On Thursday, December 11, 2014 4:44:28 PM UTC, David Artus wrote:
My understanding is that recovery from Split Brain situations is
troublesome and we are encourages to ensure that a cluster is only active
if there is a quorum of candidate masters , we do that by setting discovery.zen.minimum_master_nodes to ((cm/2) + 1), that is a true
majority of candidate masters.
I also see that many folks want to enable resilience in the event of
disaster by running a cluster across two data centres. Lose one data center
and we just keep running in the other - we still have half our servers, so
with correct over-provisioning we can cope with the workload. No need to
rebuild the cluster and restore from backup. We have what may be called
Live/Live Disaster Recovery (DR)
I see two issues with this approach to DR. If we lose half our candidate
master nodes, by definition we cannot achieve a quorum, we won't have a
true majority.
A slightly more subtle problem is that we probabaly also set gateway.recover_after_nodes to a high value (say 8 of 10) so that in
normal running with a few missing nodes we don't inadvertantly get into
shard shuffling. Once again with a loss of half our estate we will not
have enough nodes to satisfy that setting.
My conclusion is that while Live/Live operation across two Data Centers
is possible the transition to a reduced state requires reconfiguration, the
system will not "just keep running" unless we open ourselves to Split Brain
and replica shuffling.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.