Multi-data center deployments

Assuming an ElasticSearch deployment has only one shard but there are
several nodes, in two discrete data centres.

If communications between the data centres is interrupted, each node
can continue to operate (because with one shard, all the data is
"local") but what happens to changes that are written to separate
nodes? i.e. when inter-data-centre communications are restored, and
the nodes discover each other again:

  • Do writes that have happened in centre A get replicated to centre B?
  • What happens if document X is modified in centre A AND in centre B?

Thanks in advance.

Glynn

If you have one shard and I assume a replica, if the connection between the
two is interrupted, then the side with the replica shard will promote that
shard to become primary, and by default, will allow for writes. It will
also, assuming you have enough nodes, will allocate the additional shard to
the cluster.

Cross data center replication has been discussed in the mailing list, its
something that requires specific features to be developed to address it.

On Fri, Aug 12, 2011 at 7:56 AM, Glynn glynn.bird@gmail.com wrote:

Assuming an Elasticsearch deployment has only one shard but there are
several nodes, in two discrete data centres.

If communications between the data centres is interrupted, each node
can continue to operate (because with one shard, all the data is
"local") but what happens to changes that are written to separate
nodes? i.e. when inter-data-centre communications are restored, and
the nodes discover each other again:

  • Do writes that have happened in centre A get replicated to centre B?
  • What happens if document X is modified in centre A AND in centre B?

Thanks in advance.

Glynn

On Sat, Aug 13, 2011 at 9:42 AM, Shay Banon kimchy@gmail.com wrote:

If you have one shard and I assume a replica, if the connection between the
two is interrupted, then the side with the replica shard will promote that
shard to become primary, and by default, will allow for writes. It will
also, assuming you have enough nodes, will allocate the additional shard to
the cluster.

Does it autmatically replicate once the old shard is back in the
cluster? I mean how does it keep the data in sync because the new
shard doesn't have all the data.

Is there a link that gives this information in detail? It will help me
understand this.

Cross data center replication has been discussed in the mailing list, its
something that requires specific features to be developed to address it.

On Fri, Aug 12, 2011 at 7:56 AM, Glynn glynn.bird@gmail.com wrote:

Assuming an Elasticsearch deployment has only one shard but there are
several nodes, in two discrete data centres.

If communications between the data centres is interrupted, each node
can continue to operate (because with one shard, all the data is
"local") but what happens to changes that are written to separate
nodes? i.e. when inter-data-centre communications are restored, and
the nodes discover each other again:

  • Do writes that have happened in centre A get replicated to centre B?
  • What happens if document X is modified in centre A AND in centre B?

Thanks in advance.

Glynn

Does it autmatically replicate once the old shard is back in the
cluster? I mean how does it keep the data in sync because the new
shard doesn't have all the data.

If you have one replica, then you have two copies of the data: the
primary, and the replica.

Have a look at Shay's presentation at Berlin Buzzwords - it explains all
of this in much more detail:

clint