Problem with keeping in sync Elasticsearch across two data centers

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that goes
across data centers (crazy). This solution has the following problems:

  • The cluster has the split-brain problem (!)
  • The client data less node will try to do requests across different data
    centers (is there a solution to this???). I can't find a way to avoid this.
    We don't want this to happen because of a) latency and b) firewalling
    issues.

So we started to think that this solution is not really viable. So we
thought of having one cluster per data center, which seems more sensible.
But then here we have the problem that we must publish data to all clusters
and, if one fails, we have no means of rolling back (unless we try to set
up a complicated version based rollback system). I find this very
complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the same
state at any time, so that if one goes down, we can readily switch to the
other.

Any ideas, or can you recommend some support to help use deal with this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dario,

I believe that you're looking for TribeNodes

ES is not built to consistently cluster across DC's / larger network lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi darioros@gmail.com wrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that goes
across data centers (crazy). This solution has the following problems:

  • The cluster has the split-brain problem (!)
  • The client data less node will try to do requests across different data
    centers (is there a solution to this???). I can't find a way to avoid this.
    We don't want this to happen because of a) latency and b) firewalling
    issues.

So we started to think that this solution is not really viable. So we
thought of having one cluster per data center, which seems more sensible.
But then here we have the problem that we must publish data to all clusters
and, if one fails, we have no means of rolling back (unless we try to set
up a complicated version based rollback system). I find this very
complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the same
state at any time, so that if one goes down, we can readily switch to the
other.

Any ideas, or can you recommend some support to help use deal with this?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR%2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there is
a way for Elasticsearch to continue to replicate data onto a different data
center (with some delay of course) so that when the primary center fails,
the fail over data center still has most of the data (may be except for the
last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Dario,

I believe that you're looking for TribeNodes
Elasticsearch Platform — Find real-time answers at scale | Elastic

ES is not built to consistently cluster across DC's / larger network lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi darioros@gmail.com wrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that goes
across data centers (crazy). This solution has the following problems:

  • The cluster has the split-brain problem (!)
  • The client data less node will try to do requests across different data
    centers (is there a solution to this???). I can't find a way to avoid this.
    We don't want this to happen because of a) latency and b) firewalling
    issues.

So we started to think that this solution is not really viable. So we
thought of having one cluster per data center, which seems more sensible.
But then here we have the problem that we must publish data to all clusters
and, if one fails, we have no means of rolling back (unless we try to set
up a complicated version based rollback system). I find this very
complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the same
state at any time, so that if one goes down, we can readily switch to the
other.

Any ideas, or can you recommend some support to help use deal with this?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR%2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-uosp9SWCCeZmkRdQHsSJTSndA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Amit,

It sounds like you need separate ES clusters (one per DC) and a way to
feed the data into them all consistently.

I happened to scan-read the tribenodes documentation - it looks like it
could work great for reads, but (IIRC) it will not do writes.

I suspect you want some message-passing system (eg RabbitMQ) or redis
(acting as a cache).

If you were writing (system) logs then something like logstash would
help interface. However, I suspect that is not the case so you would
need to find the integration solution (between message-passing / redis
and ES) that you need for your system.

This means that you could probably use something like tribenodes for the
reads and some message-passing/proxy system for writes.

Cheers

Ivan

On 22/02/2014 18:32, Amit Soni wrote:

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there
is a way for Elasticsearch to continue to replicate data onto a
different data center (with some delay of course) so that when the
primary center fails, the fail over data center still has most of the
data (may be except for the last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick
<michael.sick@serenesoftware.com
mailto:michael.sick@serenesoftware.com> wrote:

Dario,

I believe that you're looking for
TribeNodes http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/modules-tribe.html

ES is not built to consistently cluster across DC's / larger network
lags. 

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi <darioros@gmail.com
<mailto:darioros@gmail.com>> wrote:

    Hi, 
    I've the following problem: our application publishes content to
    an Elasticsearch cluster. We use local data less node for
    querying elasticsearch then, so we don't use HTTP REST and the
    local nodes are the loadbalancer. Now they came with the
    requirement of having the cluster replicated to another data
    center too (and in the future maybe another too... ) for
    resilience. 

    At the very beginning we thought of having one large cluster
    that goes across data centers (crazy). This solution has the
    following problems:

    - The cluster has the split-brain problem (!)
    - The client data less node will try to do requests across
    different data centers (is there a solution to this???). I can't
    find a way to avoid this. We don't want this to happen because
    of a) latency and b) firewalling issues.

    So we started to think that this solution is not really viable.
    So we thought of having one cluster per data center, which seems
    more sensible. But then here we have the problem that we must
    publish data to all clusters and, if one fails, we have no means
    of rolling back (unless we try to set up a complicated version
    based rollback system). I find this very complicated and hard to
    maintain, although can be somewhat doable. 

    My biggest problem is that we have to keep the data centers in
    the same state at any time, so that if one goes down, we can
    readily switch to the other.

    Any ideas, or can you recommend some support to help use deal
    with this?

--
Ivan Beveridge

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53091254.7020001%40livejournalinc.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Amit,

Ivan is correct. You might also check out I believe that you're looking for
TribeNodes

and
see if it fits your needs for cross-dc replication.

--Mike

On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amitsoni29@gmail.com wrote:

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there is
a way for Elasticsearch to continue to replicate data onto a different data
center (with some delay of course) so that when the primary center fails,
the fail over data center still has most of the data (may be except for the
last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

Dario,

I believe that you're looking for TribeNodes
Elasticsearch Platform — Find real-time answers at scale | Elastic

ES is not built to consistently cluster across DC's / larger network
lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi darioros@gmail.com wrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that goes
across data centers (crazy). This solution has the following problems:

  • The cluster has the split-brain problem (!)
  • The client data less node will try to do requests across different
    data centers (is there a solution to this???). I can't find a way to avoid
    this. We don't want this to happen because of a) latency and b) firewalling
    issues.

So we started to think that this solution is not really viable. So we
thought of having one cluster per data center, which seems more sensible.
But then here we have the problem that we must publish data to all clusters
and, if one fails, we have no means of rolling back (unless we try to set
up a complicated version based rollback system). I find this very
complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the same
state at any time, so that if one goes down, we can readily switch to the
other.

Any ideas, or can you recommend some support to help use deal with this?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR%2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-uosp9SWCCeZmkRdQHsSJTSndA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP8axnBOi0R_c%3DcvMYrmpQK-z2%3D4-ik6tGk-ngGtnpjjn11%3DoQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think with current ES version you have 3 options.

  • Use the great snapshot and restore feature to snapshot from a DC and
    restore in the other one
  • Index in both DC (so two distinct clusters) from a client level
  • Use Tribe node feature to search or index on multiple clusters

Reference post
https://groups.google.com/forum/#!searchin/elasticsearch/TribeNodes/elasticsearch/MG1RerVSWOk/qZFWvr0HPSwJ

On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

Hi Amit,

Ivan is correct. You might also check out I believe that you're looking
for TribeNodes
Elasticsearch Platform — Find real-time answers at scale | Elastic and
see if it fits your needs for cross-dc replication.

--Mike

On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni <amits...@gmail.com<javascript:>

wrote:

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there is
a way for Elasticsearch to continue to replicate data onto a different data
center (with some delay of course) so that when the primary center fails,
the fail over data center still has most of the data (may be except for the
last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick <
michae...@serenesoftware.com <javascript:>> wrote:

Dario,

I believe that you're looking for TribeNodes
Elasticsearch Platform — Find real-time answers at scale | Elastic

ES is not built to consistently cluster across DC's / larger network
lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi <dari...@gmail.com<javascript:>

wrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that goes
across data centers (crazy). This solution has the following problems:

  • The cluster has the split-brain problem (!)
  • The client data less node will try to do requests across different
    data centers (is there a solution to this???). I can't find a way to avoid
    this. We don't want this to happen because of a) latency and b) firewalling
    issues.

So we started to think that this solution is not really viable. So we
thought of having one cluster per data center, which seems more sensible.
But then here we have the problem that we must publish data to all clusters
and, if one fails, we have no means of rolling back (unless we try to set
up a complicated version based rollback system). I find this very
complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the same
state at any time, so that if one goes down, we can readily switch to the
other.

Any ideas, or can you recommend some support to help use deal with this?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR%2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-uosp9SWCCeZmkRdQHsSJTSndA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9928e3f4-f7a4-43c5-8c4b-c42ece1d3234%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks so much everyone for sharing your thoughts!

-Amit.

On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu hariinfo@gmail.comwrote:

I think with current ES version you have 3 options.

  • Use the great snapshot and restore feature to snapshot from a DC and
    restore in the other one
  • Index in both DC (so two distinct clusters) from a client level
  • Use Tribe node feature to search or index on multiple clusters

Reference post

Redirecting to Google Groups

On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

Hi Amit,

Ivan is correct. You might also check out I believe that you're looking
for TribeNodes Elasticsearch Platform — Find real-time answers at scale | Elastic
elasticsearch/reference/master/modules-tribe.html and see if it fits
your needs for cross-dc replication.

--Mike

On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote:

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there is
a way for Elasticsearch to continue to replicate data onto a different data
center (with some delay of course) so that when the primary center fails,
the fail over data center still has most of the data (may be except for the
last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick <michae...@serenesoftware.
com> wrote:

Dario,

I believe that you're looking for TribeNodes http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
master/modules-tribe.html

ES is not built to consistently cluster across DC's / larger network
lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that goes
across data centers (crazy). This solution has the following problems:

  • The cluster has the split-brain problem (!)
  • The client data less node will try to do requests across different
    data centers (is there a solution to this???). I can't find a way to avoid
    this. We don't want this to happen because of a) latency and b) firewalling
    issues.

So we started to think that this solution is not really viable. So we
thought of having one cluster per data center, which seems more sensible.
But then here we have the problem that we must publish data to all clusters
and, if one fails, we have no means of rolling back (unless we try to set
up a complicated version based rollback system). I find this very
complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the
same state at any time, so that if one goes down, we can readily switch to
the other.

Any ideas, or can you recommend some support to help use deal with
this?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR%
2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-uosp9SWCCeZmkRdQHsSJTSndA%
40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9928e3f4-f7a4-43c5-8c4b-c42ece1d3234%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAOGaQ%2BvfY33L1%3DieizyPLT9Q22FXXauV-%2BHjxexDYNVXMONTw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

I will try the tribe node feature, even if I don't understand it
completely... but I think it deserves some experimentation

Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto:

Thanks so much everyone for sharing your thoughts!

-Amit.

On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu <hari...@gmail.com<javascript:>

wrote:

I think with current ES version you have 3 options.

  • Use the great snapshot and restore feature to snapshot from a DC and
    restore in the other one
  • Index in both DC (so two distinct clusters) from a client level
  • Use Tribe node feature to search or index on multiple clusters

Reference post

Redirecting to Google Groups

On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

Hi Amit,

Ivan is correct. You might also check out I believe that you're looking
for TribeNodes Elasticsearch Platform — Find real-time answers at scale | Elastic
elasticsearch/reference/master/modules-tribe.html and see if it fits
your needs for cross-dc replication.

--Mike

On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote:

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there is
a way for Elasticsearch to continue to replicate data onto a different data
center (with some delay of course) so that when the primary center fails,
the fail over data center still has most of the data (may be except for the
last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick <
michae...@serenesoftware.com> wrote:

Dario,

I believe that you're looking for TribeNodes http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
master/modules-tribe.html

ES is not built to consistently cluster across DC's / larger network
lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that
goes across data centers (crazy). This solution has the following problems:

  • The cluster has the split-brain problem (!)
  • The client data less node will try to do requests across different
    data centers (is there a solution to this???). I can't find a way to avoid
    this. We don't want this to happen because of a) latency and b) firewalling
    issues.

So we started to think that this solution is not really viable. So we
thought of having one cluster per data center, which seems more sensible.
But then here we have the problem that we must publish data to all clusters
and, if one fails, we have no means of rolling back (unless we try to set
up a complicated version based rollback system). I find this very
complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the
same state at any time, so that if one goes down, we can readily switch to
the other.

Any ideas, or can you recommend some support to help use deal with
this?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5424a274-3f6b-4c12-9fe6-621e04f87a8d%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAP8axnDW4GCDnnzwA%2BcyR%
2BN4g-26VV4CZ-ZW6SDGgxFL75qy%2Bw%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-
uosp9SWCCeZmkRdQHsSJTSndA%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9928e3f4-f7a4-43c5-8c4b-c42ece1d3234%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab2c578c-2ccd-4ce6-b095-450f84013fe6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

From the docs it is not clear if having two clusters with the same indexes,
a indexing operation will have effect on both...

There is a line that leaves me bit doubtful:

However, there are a few exceptions:

  • The merged view cannot handle indices with the same name in multiple
    clusters. It will pick one of them and discard the other.

Il giorno martedì 25 febbraio 2014 10:04:05 UTC, Dario Rossi ha scritto:

I will try the tribe node feature, even if I don't understand it
completely... but I think it deserves some experimentation

Il giorno martedì 25 febbraio 2014 08:05:05 UTC, amit.soni ha scritto:

Thanks so much everyone for sharing your thoughts!

-Amit.

On Sun, Feb 23, 2014 at 10:24 AM, Hariharan Vadivelu hari...@gmail.comwrote:

I think with current ES version you have 3 options.

  • Use the great snapshot and restore feature to snapshot from a DC and
    restore in the other one
  • Index in both DC (so two distinct clusters) from a client level
  • Use Tribe node feature to search or index on multiple clusters

Reference post

Redirecting to Google Groups

On Saturday, February 22, 2014 6:03:13 PM UTC-6, Michael Sick wrote:

Hi Amit,

Ivan is correct. You might also check out I believe that you're
looking for TribeNodes Elasticsearch Platform — Find real-time answers at scale | Elastic
elasticsearch/reference/master/modules-tribe.html and see if it fits
your needs for cross-dc replication.

--Mike

On Sat, Feb 22, 2014 at 1:32 PM, Amit Soni amits...@gmail.com wrote:

Hello Michael - Understand that ES is not built to maintain consistent
cluster state across data centers. what I am wondering is whether there is
a way for Elasticsearch to continue to replicate data onto a different data
center (with some delay of course) so that when the primary center fails,
the fail over data center still has most of the data (may be except for the
last few seconds/minutes/hours).

Overall I am looking for a right way to implement cross data center
deployment of elastic-search!

-Amit.

On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick <
michae...@serenesoftware.com> wrote:

Dario,

I believe that you're looking for TribeNodes http://www.
Elasticsearch Platform — Find real-time answers at scale | Elastic
master/modules-tribe.html

ES is not built to consistently cluster across DC's / larger network
lags.

On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi dari...@gmail.comwrote:

Hi,
I've the following problem: our application publishes content to an
Elasticsearch cluster. We use local data less node for querying
elasticsearch then, so we don't use HTTP REST and the local nodes are the
loadbalancer. Now they came with the requirement of having the cluster
replicated to another data center too (and in the future maybe another
too... ) for resilience.

At the very beginning we thought of having one large cluster that
goes across data centers (crazy). This solution has the following problems:

  • The cluster has the split-brain problem (!)
  • The client data less node will try to do requests across different
    data centers (is there a solution to this???). I can't find a way to avoid
    this. We don't want this to happen because of a) latency and b) firewalling
    issues.

So we started to think that this solution is not really viable. So
we thought of having one cluster per data center, which seems more
sensible. But then here we have the problem that we must publish data to
all clusters and, if one fails, we have no means of rolling back (unless we
try to set up a complicated version based rollback system). I find this
very complicated and hard to maintain, although can be somewhat doable.

My biggest problem is that we have to keep the data centers in the
same state at any time, so that if one goes down, we can readily switch to
the other.

Any ideas, or can you recommend some support to help use deal with
this?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5424a274-
3f6b-4c12-9fe6-621e04f87a8d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/
CAP8axnDW4GCDnnzwA%2BcyR%2BN4g-26VV4CZ-ZW6SDGgxFL75qy%
2Bw%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAAOGaQKLUGepyKyR4oDNq1B7-
uosp9SWCCeZmkRdQHsSJTSndA%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9928e3f4-f7a4-43c5-8c4b-c42ece1d3234%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58485177-4a2a-4f09-b09f-4780add644f7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.