Cross data center replication


(Saikat Kanjilal) #1

Hello Folks,
I digged through the documentation and found a few wiki posts in places but
nothing that seems to answer this question directly, as of the latest
release of does ES out of the box currently support cross data center
replication, I've seen a post or two regarding use cases where folks are
running ES on top of a key value store that supports this like Couchbase
but nothing to indicate that ES itself has support for this. Some insight
or links to docs regarding this would be very helpful.

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Daniel Maher-3) #2

On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

Hello Folks,
I digged through the documentation and found a few wiki posts in places
but nothing that seems to answer this question directly, as of the
latest release of does ES out of the box currently support cross data
center replication, I've seen a post or two regarding use cases where
folks are running ES on top of a key value store that supports this like
Couchbase but nothing to indicate that ES itself has support for this.
Some insight or links to docs regarding this would be very helpful.

Hello,

I'd wager that the question you're really asking about is how to control
where shards are placed; if you can make deterministic statements about
where shards are, then you can create your own "rack-aware" or "data
centre-aware" scenarios. ES has supported this "out of the box" for
well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are
the key elements of shard placement. There is an excellent blog post
which describes exactly how to set things up here :

Enjoy !

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Norberto Meijome) #3

Along the same line, once you have your zones and shards allocated across
different DCs, is it possible to have queries originating on DC #1 to only
stay in DC #1 ? ie, how can we control the ES nodes from distributing the
queries across all nodes. Alternatively, is there a way to tell the ES
cluster about 'shard distance' (so that queries are optimised where shard
distance is minimised ) ?

Thanks!!
Beto

On Wed, Apr 24, 2013 at 1:44 AM, Daniel Maher dmaher@mozilla.com wrote:

On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

Hello Folks,
I digged through the documentation and found a few wiki posts in places
but nothing that seems to answer this question directly, as of the
latest release of does ES out of the box currently support cross data
center replication, I've seen a post or two regarding use cases where
folks are running ES on top of a key value store that supports this like
Couchbase but nothing to indicate that ES itself has support for this.
Some insight or links to docs regarding this would be very helpful.

Hello,

I'd wager that the question you're really asking about is how to control
where shards are placed; if you can make deterministic statements about
where shards are, then you can create your own "rack-aware" or "data
centre-aware" scenarios. ES has supported this "out of the box" for well
over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are the
key elements of shard placement. There is an excellent blog post which
describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-
placement-control/http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Enjoy !

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Radu Gheorghe) #4

Hello Beto,

I didn't use this feature (yet), but you have some options you can specify
at query time for shard preference:
http://www.elasticsearch.org/guide/reference/api/search/preference/

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Apr 24, 2013 at 1:37 PM, Norberto Meijome numard@gmail.com wrote:

Along the same line, once you have your zones and shards allocated across
different DCs, is it possible to have queries originating on DC #1 to only
stay in DC #1 ? ie, how can we control the ES nodes from distributing the
queries across all nodes. Alternatively, is there a way to tell the ES
cluster about 'shard distance' (so that queries are optimised where shard
distance is minimised ) ?

Thanks!!
Beto

On Wed, Apr 24, 2013 at 1:44 AM, Daniel Maher dmaher@mozilla.com wrote:

On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

Hello Folks,
I digged through the documentation and found a few wiki posts in places
but nothing that seems to answer this question directly, as of the
latest release of does ES out of the box currently support cross data
center replication, I've seen a post or two regarding use cases where
folks are running ES on top of a key value store that supports this like
Couchbase but nothing to indicate that ES itself has support for this.
Some insight or links to docs regarding this would be very helpful.

Hello,

I'd wager that the question you're really asking about is how to control
where shards are placed; if you can make deterministic statements about
where shards are, then you can create your own "rack-aware" or "data
centre-aware" scenarios. ES has supported this "out of the box" for well
over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are
the key elements of shard placement. There is an excellent blog post which
describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-
placement-control/http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Enjoy !

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Norberto Meijome) #5

thanks guys, yeah, i had noticed those options - it seems I'll need an
external service to synchronise the state of the ES cluster to the app ( ZK
) ... given there is only an option to specify node by node_id, rather
than by a property of the node ( where it is located,for example).

does sound like an useful thing to have, imo - in case it isn't obvious,
I'm running ES on AWS.

cheers,
Beto

On Wed, Apr 24, 2013 at 9:26 PM, Radu Gheorghe
radu.gheorghe@sematext.comwrote:

Hello Beto,

I didn't use this feature (yet), but you have some options you can specify
at query time for shard preference:
http://www.elasticsearch.org/guide/reference/api/search/preference/

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Apr 24, 2013 at 1:37 PM, Norberto Meijome numard@gmail.comwrote:

Along the same line, once you have your zones and shards allocated across
different DCs, is it possible to have queries originating on DC #1 to only
stay in DC #1 ? ie, how can we control the ES nodes from distributing the
queries across all nodes. Alternatively, is there a way to tell the ES
cluster about 'shard distance' (so that queries are optimised where shard
distance is minimised ) ?

Thanks!!
Beto

On Wed, Apr 24, 2013 at 1:44 AM, Daniel Maher dmaher@mozilla.com wrote:

On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

Hello Folks,
I digged through the documentation and found a few wiki posts in places
but nothing that seems to answer this question directly, as of the
latest release of does ES out of the box currently support cross data
center replication, I've seen a post or two regarding use cases where
folks are running ES on top of a key value store that supports this like
Couchbase but nothing to indicate that ES itself has support for this.
Some insight or links to docs regarding this would be very helpful.

Hello,

I'd wager that the question you're really asking about is how to control
where shards are placed; if you can make deterministic statements about
where shards are, then you can create your own "rack-aware" or "data
centre-aware" scenarios. ES has supported this "out of the box" for well
over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are
the key elements of shard placement. There is an excellent blog post which
describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-
placement-control/http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Enjoy !

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(phill) #6

On 4/23/2013 8:44 AM, Daniel Maher wrote:

On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

Hello Folks,
[...] does ES out of the box currently support cross data
center replication, [....]

Hello,

I'd wager that the question you're really asking about is how to
control where shards are placed; if you can make deterministic
statements about where shards are, then you can create your own
"rack-aware" or "data centre-aware" scenarios. ES has supported this
"out of the box" for well over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are
the key elements of shard placement. There is an excellent blog post
which describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Is shard allocation really the correct solution if the data centers are
globally distributed?

If I have a data center in the US intended to server data from the US,
but it should also have access to Europe and Asia data, and clusters in
both Europe and Asia with similar needs, would I really want to use
zones etc. and have one great global cluster with data center aware
configurations?

Assuming that the US would be happy to deal with old documents from Asia
and Europe, when Asia or Europe is off line or just not caught up, it
would seem that you would NOT want a "world" cluster, because I can't
picture how you'd configure a 3-part world cluster for both index into
the right indices, search the right (possible combination of) shards,
but also preventing "split brain".

In the scenerio, I've described, I would think each data center might
better provide availability and eventual consistency (with less concern
for the remote data from the other region) by having three clusters and
some type of syncing from one index to copies at the other two
locations. For example, the US datacenter might have a US,
copyOfEurope, and copyOfAsia index.

Anyone have any observations about such a world-wide scenerio?
Are there any index to index copy utilities?
Is there a river or other plugin that might be useful for this three
clusters working together scenerio?
How about the project https://github.com/karussell/elasticsearch-reindex?
Comments?

-Paul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Norberto Meijome) #7

+1 on all of the above. es-reindex already in my list of things to
investigate (for a number of issues...)

cheers,
b

On Wed, May 1, 2013 at 6:58 AM, Paul Hill parehill1@gmail.com wrote:

On 4/23/2013 8:44 AM, Daniel Maher wrote:

On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

Hello Folks,
[...] does ES out of the box currently support cross data
center replication, [....]

Hello,

I'd wager that the question you're really asking about is how to control
where shards are placed; if you can make deterministic statements about
where shards are, then you can create your own "rack-aware" or "data
centre-aware" scenarios. ES has supported this "out of the box" for well
over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are
the key elements of shard placement. There is an excellent blog post which
describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-
placement-control/http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Is shard allocation really the correct solution if the data centers are
globally distributed?

If I have a data center in the US intended to server data from the US, but
it should also have access to Europe and Asia data, and clusters in both
Europe and Asia with similar needs, would I really want to use zones etc.
and have one great global cluster with data center aware configurations?

Assuming that the US would be happy to deal with old documents from Asia
and Europe, when Asia or Europe is off line or just not caught up, it would
seem that you would NOT want a "world" cluster, because I can't picture how
you'd configure a 3-part world cluster for both index into the right
indices, search the right (possible combination of) shards, but also
preventing "split brain".

In the scenerio, I've described, I would think each data center might
better provide availability and eventual consistency (with less concern for
the remote data from the other region) by having three clusters and some
type of syncing from one index to copies at the other two locations. For
example, the US datacenter might have a US, copyOfEurope, and copyOfAsia
index.

Anyone have any observations about such a world-wide scenerio?
Are there any index to index copy utilities?
Is there a river or other plugin that might be useful for this three
clusters working together scenerio?
How about the project https://github.com/karussell/**elasticsearch-reindexhttps://github.com/karussell/elasticsearch-reindex
?
Comments?

-Paul

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Todd Nine) #8

Hey all,

Sorry to resurrect a dead thread. Did you ever find a solution for
eventual consistency of documents across EC2 regions?

Thanks,
todd

On Wednesday, May 1, 2013 5:50:00 AM UTC-7, Norberto Meijome wrote:

+1 on all of the above. es-reindex already in my list of things to
investigate (for a number of issues...)

cheers,
b

On Wed, May 1, 2013 at 6:58 AM, Paul Hill <pare...@gmail.com <javascript:>

wrote:

On 4/23/2013 8:44 AM, Daniel Maher wrote:

On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

Hello Folks,
[...] does ES out of the box currently support cross data
center replication, [....]

Hello,

I'd wager that the question you're really asking about is how to control
where shards are placed; if you can make deterministic statements about
where shards are, then you can create your own "rack-aware" or "data
centre-aware" scenarios. ES has supported this "out of the box" for well
over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are
the key elements of shard placement. There is an excellent blog post which
describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-
placement-control/

Is shard allocation really the correct solution if the data centers are
globally distributed?

If I have a data center in the US intended to server data from the US,
but it should also have access to Europe and Asia data, and clusters in
both Europe and Asia with similar needs, would I really want to use zones
etc. and have one great global cluster with data center aware
configurations?

Assuming that the US would be happy to deal with old documents from Asia
and Europe, when Asia or Europe is off line or just not caught up, it would
seem that you would NOT want a "world" cluster, because I can't picture how
you'd configure a 3-part world cluster for both index into the right
indices, search the right (possible combination of) shards, but also
preventing "split brain".

In the scenerio, I've described, I would think each data center might
better provide availability and eventual consistency (with less concern for
the remote data from the other region) by having three clusters and some
type of syncing from one index to copies at the other two locations. For
example, the US datacenter might have a US, copyOfEurope, and copyOfAsia
index.

Anyone have any observations about such a world-wide scenerio?
Are there any index to index copy utilities?
Is there a river or other plugin that might be useful for this three
clusters working together scenerio?
How about the project https://github.com/karussell/elasticsearch-reindex?
Comments?

-Paul

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/646067d1-1137-4777-be51-ced0bd6a3edd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(MatthewParrott) #9

I'm interested in this too.
es-reindex seems like it lacks conflict resolution, and as noted in the
docs, would be better implemented as a river.

On Wednesday, June 4, 2014 9:03:37 PM UTC-7, Todd Nine wrote:

Hey all,

Sorry to resurrect a dead thread. Did you ever find a solution for
eventual consistency of documents across EC2 regions?

Thanks,
todd

On Wednesday, May 1, 2013 5:50:00 AM UTC-7, Norberto Meijome wrote:

+1 on all of the above. es-reindex already in my list of things to
investigate (for a number of issues...)

cheers,
b

On Wed, May 1, 2013 at 6:58 AM, Paul Hill pare...@gmail.com wrote:

On 4/23/2013 8:44 AM, Daniel Maher wrote:

On 2013-04-23 5:22 PM, Saikat Kanjilal wrote:

Hello Folks,
[...] does ES out of the box currently support cross data
center replication, [....]

Hello,

I'd wager that the question you're really asking about is how to
control where shards are placed; if you can make deterministic statements
about where shards are, then you can create your own "rack-aware" or "data
centre-aware" scenarios. ES has supported this "out of the box" for well
over a year now (possibly longer).

You'll want to investigate "zones" and "routing allocation", which are
the key elements of shard placement. There is an excellent blog post which
describes exactly how to set things up here :
http://blog.sematext.com/2012/05/29/elasticsearch-shard-
placement-control/

Is shard allocation really the correct solution if the data centers
are globally distributed?

If I have a data center in the US intended to server data from the US,
but it should also have access to Europe and Asia data, and clusters in
both Europe and Asia with similar needs, would I really want to use zones
etc. and have one great global cluster with data center aware
configurations?

Assuming that the US would be happy to deal with old documents from Asia
and Europe, when Asia or Europe is off line or just not caught up, it would
seem that you would NOT want a "world" cluster, because I can't picture how
you'd configure a 3-part world cluster for both index into the right
indices, search the right (possible combination of) shards, but also
preventing "split brain".

In the scenerio, I've described, I would think each data center might
better provide availability and eventual consistency (with less concern for
the remote data from the other region) by having three clusters and some
type of syncing from one index to copies at the other two locations. For
example, the US datacenter might have a US, copyOfEurope, and copyOfAsia
index.

Anyone have any observations about such a world-wide scenerio?
Are there any index to index copy utilities?
Is there a river or other plugin that might be useful for this three
clusters working together scenerio?
How about the project https://github.com/karussell/elasticsearch-reindex
?
Comments?

-Paul

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Norberto 'Beto' Meijome

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/86f03167-6803-4bdd-9278-21b222e56d7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #10