ES and multiple datacenters

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if I
want to sync it between 2 datacenters?

10x

Hi,

There are several ways to try and solve the "data center problem". In
short, elasticsearch is not data center aware. If you want to sync between
two data centers, you need to do it manually. How do you solve the two data
centers problem with your data storage? Maybe based on that I can help.

-shay.banon

On Thu, Mar 25, 2010 at 10:38 PM, Ori Lahav olahav@gmail.com wrote:

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if I
want to sync it between 2 datacenters?

10x

So... It depends what data storage:

  • MySql have it's own replication mechanism.
  • Solr - same thing - have it's own replication.
  • MogileFS is DC aware and can send X replicas of the data to second DC.
  • Cassandra and Hadoop are DC aware.

I think something along the lines of Cassandra awareness might be great.

do you have any plans for this feature?

On Thu, Mar 25, 2010 at 11:11 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

Hi,

There are several ways to try and solve the "data center problem". In
short, elasticsearch is not data center aware. If you want to sync between
two data centers, you need to do it manually. How do you solve the two data
centers problem with your data storage? Maybe based on that I can help.

-shay.banon

On Thu, Mar 25, 2010 at 10:38 PM, Ori Lahav olahav@gmail.com wrote:

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if I
want to sync it between 2 datacenters?

10x

--

Do you handle cases where the two data centers have conflicting updates?
Cassandra "can" handle it, the others I am not that sure... . What exactly
are you after with two data centers? One active and one backup, with reads
going local to each?

On Thu, Mar 25, 2010 at 11:36 PM, Ori Lahav olahav@gmail.com wrote:

So... It depends what data storage:

  • MySql have it's own replication mechanism.
  • Solr - same thing - have it's own replication.
  • MogileFS is DC aware and can send X replicas of the data to second DC.
  • Cassandra and Hadoop are DC aware.

I think something along the lines of Cassandra awareness might be great.

do you have any plans for this feature?

On Thu, Mar 25, 2010 at 11:11 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Hi,

There are several ways to try and solve the "data center problem". In
short, elasticsearch is not data center aware. If you want to sync between
two data centers, you need to do it manually. How do you solve the two data
centers problem with your data storage? Maybe based on that I can help.

-shay.banon

On Thu, Mar 25, 2010 at 10:38 PM, Ori Lahav olahav@gmail.com wrote:

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if I
want to sync it between 2 datacenters?

10x

--
http://olahav.typepad.com

Yes, that can be a great start,
update the index in one "master" datacenter and then replicate it to the
second one. reads are done on both DCs.
I want also to keep the option that if the "master" DC fails I can move the
writes to the second one.

10x

On Thu, Mar 25, 2010 at 11:38 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

Do you handle cases where the two data centers have conflicting updates?
Cassandra "can" handle it, the others I am not that sure... . What exactly
are you after with two data centers? One active and one backup, with reads
going local to each?

On Thu, Mar 25, 2010 at 11:36 PM, Ori Lahav olahav@gmail.com wrote:

So... It depends what data storage:

  • MySql have it's own replication mechanism.
  • Solr - same thing - have it's own replication.
  • MogileFS is DC aware and can send X replicas of the data to second DC.
  • Cassandra and Hadoop are DC aware.

I think something along the lines of Cassandra awareness might be great.

do you have any plans for this feature?

On Thu, Mar 25, 2010 at 11:11 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Hi,

There are several ways to try and solve the "data center problem". In
short, elasticsearch is not data center aware. If you want to sync between
two data centers, you need to do it manually. How do you solve the two data
centers problem with your data storage? Maybe based on that I can help.

-shay.banon

On Thu, Mar 25, 2010 at 10:38 PM, Ori Lahav olahav@gmail.com wrote:

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if I
want to sync it between 2 datacenters?

10x

--
http://olahav.typepad.com

--

You did not answer my question :). How do you handle it today? Do you handle
conflict updates?

Regarding elasticsearch, then yes, I do plan to support various models:

  • Two completely separate clusters, that replicate changes to the other
    cluster. Reads / Search will go to the local datacenter by "default", since
    you configure the search / read clients on each data center (your web tier
    or something similar to work against the local cluster).

Note, you can do it today quite easily on the "client" side. The code you
use to index data, make sure it applies to both data centers (queue it, or
something similar).

  • A single cluster that spans two data centers, with special allocation
    strategy that make sure that a shard and its replica do not exists on the
    same data center. And that read / search prefer "local" data center shards
    then going to search on another data center.

Both are not that difficult to implement thanks to how elasticsearch is
designed.

-shay.banon

On Thu, Mar 25, 2010 at 11:43 PM, Ori Lahav olahav@gmail.com wrote:

Yes, that can be a great start,
update the index in one "master" datacenter and then replicate it to the
second one. reads are done on both DCs.
I want also to keep the option that if the "master" DC fails I can move the
writes to the second one.

10x

On Thu, Mar 25, 2010 at 11:38 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Do you handle cases where the two data centers have conflicting updates?
Cassandra "can" handle it, the others I am not that sure... . What exactly
are you after with two data centers? One active and one backup, with reads
going local to each?

On Thu, Mar 25, 2010 at 11:36 PM, Ori Lahav olahav@gmail.com wrote:

So... It depends what data storage:

  • MySql have it's own replication mechanism.
  • Solr - same thing - have it's own replication.
  • MogileFS is DC aware and can send X replicas of the data to second DC.
  • Cassandra and Hadoop are DC aware.

I think something along the lines of Cassandra awareness might be great.

do you have any plans for this feature?

On Thu, Mar 25, 2010 at 11:11 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Hi,

There are several ways to try and solve the "data center problem". In
short, elasticsearch is not data center aware. If you want to sync between
two data centers, you need to do it manually. How do you solve the two data
centers problem with your data storage? Maybe based on that I can help.

-shay.banon

On Thu, Mar 25, 2010 at 10:38 PM, Ori Lahav olahav@gmail.com wrote:

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if I
want to sync it between 2 datacenters?

10x

--
http://olahav.typepad.com

--
http://olahav.typepad.com

so... basically we have no need to handle conflicts as writes are being done
at only one DC and replicated to the other.
will be happy to hear about it when we will meet :slight_smile:

On Thu, Mar 25, 2010 at 11:53 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

You did not answer my question :). How do you handle it today? Do you
handle conflict updates?

Regarding elasticsearch, then yes, I do plan to support various models:

  • Two completely separate clusters, that replicate changes to the other
    cluster. Reads / Search will go to the local datacenter by "default", since
    you configure the search / read clients on each data center (your web tier
    or something similar to work against the local cluster).

Note, you can do it today quite easily on the "client" side. The code you
use to index data, make sure it applies to both data centers (queue it, or
something similar).

  • A single cluster that spans two data centers, with special allocation
    strategy that make sure that a shard and its replica do not exists on the
    same data center. And that read / search prefer "local" data center shards
    then going to search on another data center.

Both are not that difficult to implement thanks to how elasticsearch is
designed.

-shay.banon

On Thu, Mar 25, 2010 at 11:43 PM, Ori Lahav olahav@gmail.com wrote:

Yes, that can be a great start,
update the index in one "master" datacenter and then replicate it to the
second one. reads are done on both DCs.
I want also to keep the option that if the "master" DC fails I can move
the writes to the second one.

10x

On Thu, Mar 25, 2010 at 11:38 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Do you handle cases where the two data centers have conflicting updates?
Cassandra "can" handle it, the others I am not that sure... . What exactly
are you after with two data centers? One active and one backup, with reads
going local to each?

On Thu, Mar 25, 2010 at 11:36 PM, Ori Lahav olahav@gmail.com wrote:

So... It depends what data storage:

  • MySql have it's own replication mechanism.
  • Solr - same thing - have it's own replication.
  • MogileFS is DC aware and can send X replicas of the data to second DC.
  • Cassandra and Hadoop are DC aware.

I think something along the lines of Cassandra awareness might be great.

do you have any plans for this feature?

On Thu, Mar 25, 2010 at 11:11 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Hi,

There are several ways to try and solve the "data center problem".
In short, elasticsearch is not data center aware. If you want to sync
between two data centers, you need to do it manually. How do you solve the
two data centers problem with your data storage? Maybe based on that I can
help.

-shay.banon

On Thu, Mar 25, 2010 at 10:38 PM, Ori Lahav olahav@gmail.com wrote:

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if I
want to sync it between 2 datacenters?

10x

--
http://olahav.typepad.com

--
http://olahav.typepad.com

--

"A single cluster that spans two data centers, with special allocation
strategy" +1 This would be great.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Thu, Mar 25, 2010 at 5:53 PM, Shay Banon shay.banon@elasticsearch.comwrote:

You did not answer my question :). How do you handle it today? Do you
handle conflict updates?

Regarding elasticsearch, then yes, I do plan to support various models:

  • Two completely separate clusters, that replicate changes to the other
    cluster. Reads / Search will go to the local datacenter by "default", since
    you configure the search / read clients on each data center (your web tier
    or something similar to work against the local cluster).

Note, you can do it today quite easily on the "client" side. The code you
use to index data, make sure it applies to both data centers (queue it, or
something similar).

  • A single cluster that spans two data centers, with special allocation
    strategy that make sure that a shard and its replica do not exists on the
    same data center. And that read / search prefer "local" data center shards
    then going to search on another data center.

Both are not that difficult to implement thanks to how elasticsearch is
designed.

-shay.banon

On Thu, Mar 25, 2010 at 11:43 PM, Ori Lahav olahav@gmail.com wrote:

Yes, that can be a great start,
update the index in one "master" datacenter and then replicate it to the
second one. reads are done on both DCs.
I want also to keep the option that if the "master" DC fails I can move
the writes to the second one.

10x

On Thu, Mar 25, 2010 at 11:38 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Do you handle cases where the two data centers have conflicting updates?
Cassandra "can" handle it, the others I am not that sure... . What exactly
are you after with two data centers? One active and one backup, with reads
going local to each?

On Thu, Mar 25, 2010 at 11:36 PM, Ori Lahav olahav@gmail.com wrote:

So... It depends what data storage:

  • MySql have it's own replication mechanism.
  • Solr - same thing - have it's own replication.
  • MogileFS is DC aware and can send X replicas of the data to second DC.
  • Cassandra and Hadoop are DC aware.

I think something along the lines of Cassandra awareness might be great.

do you have any plans for this feature?

On Thu, Mar 25, 2010 at 11:11 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Hi,

There are several ways to try and solve the "data center problem".
In short, elasticsearch is not data center aware. If you want to sync
between two data centers, you need to do it manually. How do you solve the
two data centers problem with your data storage? Maybe based on that I can
help.

-shay.banon

On Thu, Mar 25, 2010 at 10:38 PM, Ori Lahav olahav@gmail.com wrote:

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if I
want to sync it between 2 datacenters?

10x

--
http://olahav.typepad.com

--
http://olahav.typepad.com

Ahh, life is simple :). This is a much simpler case to solve.

-shay.banon

On Thu, Mar 25, 2010 at 11:59 PM, Ori Lahav olahav@gmail.com wrote:

so... basically we have no need to handle conflicts as writes are being
done at only one DC and replicated to the other.
will be happy to hear about it when we will meet :slight_smile:

On Thu, Mar 25, 2010 at 11:53 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

You did not answer my question :). How do you handle it today? Do you
handle conflict updates?

Regarding elasticsearch, then yes, I do plan to support various models:

  • Two completely separate clusters, that replicate changes to the other
    cluster. Reads / Search will go to the local datacenter by "default", since
    you configure the search / read clients on each data center (your web tier
    or something similar to work against the local cluster).

Note, you can do it today quite easily on the "client" side. The code you
use to index data, make sure it applies to both data centers (queue it, or
something similar).

  • A single cluster that spans two data centers, with special allocation
    strategy that make sure that a shard and its replica do not exists on the
    same data center. And that read / search prefer "local" data center shards
    then going to search on another data center.

Both are not that difficult to implement thanks to how elasticsearch is
designed.

-shay.banon

On Thu, Mar 25, 2010 at 11:43 PM, Ori Lahav olahav@gmail.com wrote:

Yes, that can be a great start,
update the index in one "master" datacenter and then replicate it to the
second one. reads are done on both DCs.
I want also to keep the option that if the "master" DC fails I can move
the writes to the second one.

10x

On Thu, Mar 25, 2010 at 11:38 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Do you handle cases where the two data centers have conflicting updates?
Cassandra "can" handle it, the others I am not that sure... . What exactly
are you after with two data centers? One active and one backup, with reads
going local to each?

On Thu, Mar 25, 2010 at 11:36 PM, Ori Lahav olahav@gmail.com wrote:

So... It depends what data storage:

  • MySql have it's own replication mechanism.
  • Solr - same thing - have it's own replication.
  • MogileFS is DC aware and can send X replicas of the data to second
    DC.
  • Cassandra and Hadoop are DC aware.

I think something along the lines of Cassandra awareness might be
great.

do you have any plans for this feature?

On Thu, Mar 25, 2010 at 11:11 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Hi,

There are several ways to try and solve the "data center problem".
In short, elasticsearch is not data center aware. If you want to sync
between two data centers, you need to do it manually. How do you solve the
two data centers problem with your data storage? Maybe based on that I can
help.

-shay.banon

On Thu, Mar 25, 2010 at 10:38 PM, Ori Lahav olahav@gmail.com wrote:

Hi Shay.

Just a question.

Does ES is datacenter aware?

Does it support some kind of replication of the index or the data if
I
want to sync it between 2 datacenters?

10x

--
http://olahav.typepad.com

--
http://olahav.typepad.com

--
http://olahav.typepad.com

Bringing this old topic back up a bit...

We have done multi-datacenter deployments of Solr and replicate across them;
initially by doing snapshots of changes made before optimize, later with
file system replication, but then finally by having a transaction log that
feeds solr that is replayed on the 2nd data center where indexing is
performed on its own.

The log files therefore served doubly to allow quick re-indexing without
going to source (if they are retained). Something worth thinking about for
ES. Have a schema change? Reindex by replaying the logs at a much higher
rate than the ingestion system might be able to start from the raw source.

This also helps in the case where ES is the only store (other than raw
source material such as files) and you want to trust you have a quicker
rebuild. And it helps if you have another source such as a DB where you may
not have a quick way to bulk export for an full reindex.

(note: This log is obviously pre-analyzer and consists basically of input
documents)

I can quickly add this to my fork of ES and see how it plays (assuming that
the client-side writes logs from multiple writers, and a River would consume
them on ES by merge sorting transactions from the many logs in bulk)

--j

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/ES-and-multiple-datacenters-tp551665p1860682.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Shay - This is pretty old thread and wondering if there are features in ES now for multi data center deployment

1 Like