Cluster to cluster replication

Hi,
I'm evaluating Elastic Search and so far I'm very impressed with it,
however there are a few important features I think are missing. I will
post about the other separately but one of the most important ones is
cluster to cluster replication and it seems like it should not be
difficult to achieve based on the current implementation.
There are two motivations for replicating a whole cluster despite
having shard replication already:

  1. It allows geographic distribution of the content for performance
    and redundancy, where each location has its own cluster which is
    replicated to one or more other clusters.
  2. It provides a hot recovery in case of a data corrupting bug in ES
    itself.

The simplest implementation is a master/slave configuration where
writes are only allowed to the master cluster and manual promotion is
required in order to allow the slave to become a "master" and accept
writes. This should be easily done if every node replicated its own
writes, with their version numbers to the slave cluster/s.

A more advanced configuration would allow active/active configuration
where writes can go to both clusters but not on the same index. So
basically, every index will have its own master cluster where writes
are allowed and slaves where only reads are allowed.

The most advanced configuration is full master/master configuration
where writes are allowed to all the clusters and they can replicate
data cyclically between themselves. This is obviously the ideal
situation because it is a superset of the simpler scenarios described
above. I'm not sure if this an be done as easily but I still think
it's worth considering. It will require tagging the source cluster of
each each update and propagating it as part of the replication so
changes are not propagated back to their origin cluster. It will also
require a method for resolving conflicts, while it is possible to use
vector clocks and such, it is probably good enough to use simple
timestamps to determine the most recent updates, especially since only
full document updates are allowed.

I'm wondering if this has been considered in the past, since I
couldn't find anything about it in the mailing list?

-eran

Heya,

Yes, this certainly has been considered and discussed. Currently, you need
to do it yourself on the client side, but, its certainly planned (though
more down the road) to have cluster to cluster replication.

-shay.banon

On Mon, Jul 18, 2011 at 6:09 PM, Eran Kutner eran@gigya.com wrote:

Hi,
I'm evaluating Elastic Search and so far I'm very impressed with it,
however there are a few important features I think are missing. I will
post about the other separately but one of the most important ones is
cluster to cluster replication and it seems like it should not be
difficult to achieve based on the current implementation.
There are two motivations for replicating a whole cluster despite
having shard replication already:

  1. It allows geographic distribution of the content for performance
    and redundancy, where each location has its own cluster which is
    replicated to one or more other clusters.
  2. It provides a hot recovery in case of a data corrupting bug in ES
    itself.

The simplest implementation is a master/slave configuration where
writes are only allowed to the master cluster and manual promotion is
required in order to allow the slave to become a "master" and accept
writes. This should be easily done if every node replicated its own
writes, with their version numbers to the slave cluster/s.

A more advanced configuration would allow active/active configuration
where writes can go to both clusters but not on the same index. So
basically, every index will have its own master cluster where writes
are allowed and slaves where only reads are allowed.

The most advanced configuration is full master/master configuration
where writes are allowed to all the clusters and they can replicate
data cyclically between themselves. This is obviously the ideal
situation because it is a superset of the simpler scenarios described
above. I'm not sure if this an be done as easily but I still think
it's worth considering. It will require tagging the source cluster of
each each update and propagating it as part of the replication so
changes are not propagated back to their origin cluster. It will also
require a method for resolving conflicts, while it is possible to use
vector clocks and such, it is probably good enough to use simple
timestamps to determine the most recent updates, especially since only
full document updates are allowed.

I'm wondering if this has been considered in the past, since I
couldn't find anything about it in the mailing list?

-eran