Multi DC cluster or separate cluster per DC?


(Sebastian Łaskawiec) #1

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

  • 4 ES instances per DC. Full cluster will have 8 instances.
  • Up to 1 TB of data
  • Data pulled from database using JDBC River
  • Database is replicated asynchronously between DCs. Each DC will have
    its own database instance to pull data.
  • Average latency between DCs is about several miliseconds
  • We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue. Here
is how we want to prevent it:

  • Set node.master: true only in 4 nodes in active DC
  • Set node.master: false in passive DC
  • This way we'll be sure that new cluster will not be created in passive
    DC
  • Additionally we'd like to set discovery.zen.minimum_master_nodes: 3
    (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active
and active becomes passive). In our system it takes about 20 minutes and
this is the maximum length of our maintenance window. We were thinking of
shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's the
sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec sebastian.laskawiec@gmail.comwrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

  • 4 ES instances per DC. Full cluster will have 8 instances.
  • Up to 1 TB of data
  • Data pulled from database using JDBC River
  • Database is replicated asynchronously between DCs. Each DC will have
    its own database instance to pull data.
  • Average latency between DCs is about several miliseconds
  • We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue. Here
is how we want to prevent it:

  • Set node.master: true only in 4 nodes in active DC
  • Set node.master: false in passive DC
  • This way we'll be sure that new cluster will not be created in
    passive DC
  • Additionally we'd like to set discovery.zen.minimum_master_nodes: 3
    (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active
and active becomes passive). In our system it takes about 20 minutes and
this is the maximum length of our maintenance window. We were thinking of
shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b94kkDgY5ehdwSvPkA4TaZ9QPvds%3DZHsJ%2B5DFX1_e3xQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Sebastian Łaskawiec) #3

Thanks for the answer! We've been talking with several other teams in our
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's the
sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec <sebastian...@gmail.com<javascript:>

wrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

  • 4 ES instances per DC. Full cluster will have 8 instances.
  • Up to 1 TB of data
  • Data pulled from database using JDBC River
  • Database is replicated asynchronously between DCs. Each DC will
    have its own database instance to pull data.
  • Average latency between DCs is about several miliseconds
  • We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue.
Here is how we want to prevent it:

  • Set node.master: true only in 4 nodes in active DC
  • Set node.master: false in passive DC
  • This way we'll be sure that new cluster will not be created in
    passive DC
  • Additionally we'd like to set discovery.zen.minimum_master_nodes: 3
    (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active
and active becomes passive). In our system it takes about 20 minutes and
this is the maximum length of our maintenance window. We were thinking of
shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71a8db73-40bc-431d-bb9a-b581f510cf03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(dkjhanitt) #4

Having a separate cluster is definitely a better way to go. OR, you can
control the shard, replica placement so that they are always placed in the
same DC. In this way, you can avoid interDC issues still having a single
cluster. I have the similar issue and I am looking at it as one of the
alternative.

On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

Thanks for the answer! We've been talking with several other teams in our
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's
the sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

  • 4 ES instances per DC. Full cluster will have 8 instances.
  • Up to 1 TB of data
  • Data pulled from database using JDBC River
  • Database is replicated asynchronously between DCs. Each DC will
    have its own database instance to pull data.
  • Average latency between DCs is about several miliseconds
  • We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue.
Here is how we want to prevent it:

  • Set node.master: true only in 4 nodes in active DC
  • Set node.master: false in passive DC
  • This way we'll be sure that new cluster will not be created in
    passive DC
  • Additionally we'd like to set discovery.zen.minimum_master_nodes:
    3 (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active
and active becomes passive). In our system it takes about 20 minutes and
this is the maximum length of our maintenance window. We were thinking of
shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Amit Soni) #5

I am just wondering whether elastic search team has any plans to add
features for multi-data center deployment (active-active)?

-Amit.

On Mon, May 12, 2014 at 11:02 AM, Deepak Jha dkjhanitt@gmail.com wrote:

Having a separate cluster is definitely a better way to go. OR, you can
control the shard, replica placement so that they are always placed in the
same DC. In this way, you can avoid interDC issues still having a single
cluster. I have the similar issue and I am looking at it as one of the
alternative.

On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

Thanks for the answer! We've been talking with several other teams in our
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's
the sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

  • 4 ES instances per DC. Full cluster will have 8 instances.
  • Up to 1 TB of data
  • Data pulled from database using JDBC River
  • Database is replicated asynchronously between DCs. Each DC will
    have its own database instance to pull data.
  • Average latency between DCs is about several miliseconds
  • We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue.
Here is how we want to prevent it:

  • Set node.master: true only in 4 nodes in active DC
  • Set node.master: false in passive DC
  • This way we'll be sure that new cluster will not be created in
    passive DC
  • Additionally we'd like to set discovery.zen.minimum_master_nodes:
    3 (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes
active and active becomes passive). In our system it takes about 20 minutes
and this is the maximum length of our maintenance window. We were thinking
of shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAOGaQJU_Mb2Puk9kc0KcEwZeaQj2XaFdCrUCuVMMa%2BWt_289A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Sebastian Łaskawiec) #6

We are still thinking about production configuration and here is a short
list of single/separate cluster's advantages and disadvantages...

Single cluster:

  • (+) If you have single cluster - you perform single query to the
    database. In case of having cluster per DC - each cluster needs to query DB
    separately
  • (+) Data consistency - in the matter of fact this is achieved by
    single query to the DB
  • (+) You can introduce new DC easily
  • (+) True active-active configuration
  • (-) Split brain and pretty complicated configuration (to avoid split
    brain in case when DC link is down)
  • (-) node.master setting can not be changed in runtime (take a look at
    my first post and split brain solution)
  • (-) In case of a disaster we need to operate on single DC. If you use
    single cluster per 2 DCs you can't really tell if a single DC is strong
    enough to handle query and indexing load
  • (-) In pessimistic scenario data travels through WAN 2 times (first
    time - database replication, second time - ES replication)
  • (-) You can't really tell which node will respond to the query. Let's
    assume that you have full index in each DC (force awareness option). ES
    might decide to gather results from the remote DC and not from the local
    one. This way you need to add WAN latency into your query time.
  • (-) You need to turn off whole cluster or perform cycle restarts
    during upgrade

Separate cluster per DC:

  • (+) No Split brain
  • (+) You can tell precisely when you are out of resources to handle
    load in ES cluster in each DC
  • (+) You can experiment with different settings on production. If
    something goes wrong - just switch clients to standby DC.
  • (+) Full failover - in case of any problems - just switch to the other
    DC
  • (+) Upgrades are easy and you have no down time (upgrade first DC,
    stabilize it, test it, and then to the same to the other DC)
  • (+) Since these are 2 separate clusters you can avoid data traveling
    through WAN during queries. Each DC queries nodes locally.
  • (-) It is not a full active-active configuration. It's more like an
    active-standby configuration
  • (-) Data inconsistency might occur (different results when queried
    local and remote DC)
  • (-) Each DC will query DB separately. This will generate additional
    load to the DB

Right now we think we should go for 2 separate clusters. DB load is a thing
which worries me the most (we have really complicated query with a lot of
left joins). However we think that in our case having to separate DC have
more advantages then disadvantages.

If you have some more arguments or comments - please let us know :slight_smile:

Regards
Sebastian

W dniu poniedziałek, 12 maja 2014 20:02:35 UTC+2 użytkownik Deepak Jha
napisał:

Having a separate cluster is definitely a better way to go. OR, you can
control the shard, replica placement so that they are always placed in the
same DC. In this way, you can avoid interDC issues still having a single
cluster. I have the similar issue and I am looking at it as one of the
alternative.

On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

Thanks for the answer! We've been talking with several other teams in our
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's
the sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

  • 4 ES instances per DC. Full cluster will have 8 instances.
  • Up to 1 TB of data
  • Data pulled from database using JDBC River
  • Database is replicated asynchronously between DCs. Each DC will
    have its own database instance to pull data.
  • Average latency between DCs is about several miliseconds
  • We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue.
Here is how we want to prevent it:

  • Set node.master: true only in 4 nodes in active DC
  • Set node.master: false in passive DC
  • This way we'll be sure that new cluster will not be created in
    passive DC
  • Additionally we'd like to set discovery.zen.minimum_master_nodes:
    3 (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes
active and active becomes passive). In our system it takes about 20 minutes
and this is the maximum length of our maintenance window. We were thinking
of shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2bf8598c-463d-4216-b964-068ec26b97fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7