Cross data centre backup and recovery

Hi,
Can a separate cluster be completed recovered by using rsync to copy over
the contents of the local data directories and using local Gateway?

Background:
We're considering using ES as the sole database for some important log
data. To provide the necessary protection we need to store the same data in
two data centres. The incoming data is coming from Flume so we can tee and
pipe it around as we like.

Doing this as a single cluster appears to be a bad idea because of the
split brain problem. If the WAN link is broken then both sides will think
that they are the primary.

For this reason, we've been thinking of running two clusters - one in each
data centre and feeding them the same data.

A problem we're considering what happens if we switch over to the standby
data centre and then want to switch back. Conceivably, we could transfer
the local data directories over the network so that the other cluster is
now mirrored and then startup from there.

Thoughts?

Many thanks in advance.

Cheers,
Edward

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-03-07 at 15:27 -0800, Edward Sargisson wrote:

Hi,
Can a separate cluster be completed recovered by using rsync to copy
over the contents of the local data directories and using local
Gateway?

Yes it can.

Background:
We're considering using ES as the sole database for some important log
data. To provide the necessary protection we need to store the same
data in two data centres. The incoming data is coming from Flume so we
can tee and pipe it around as we like.

Doing this as a single cluster appears to be a bad idea because of the
split brain problem. If the WAN link is broken then both sides will
think that they are the primary.

For this reason, we've been thinking of running two clusters - one in
each data centre and feeding them the same data.

Makes sense.

A problem we're considering what happens if we switch over to the
standby data centre and then want to switch back. Conceivably, we
could transfer the local data directories over the network so that the
other cluster is now mirrored and then startup from there.

Given that you won't be deleting documents, what about keeping them in
sync with a simple query?

eg run a loop which queries all docs where timestamp > $last_run,
(requesting the version numbers as well), and indexing those docs to the
other cluster

that way it'll be easy to run a master-slave setup, and easy to switch
direction

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.