Multi DC cluster or separate cluster per DC?

Sebastian_Laskawiec · May 6, 2014, 1:51pm

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

4 ES instances per DC. Full cluster will have 8 instances.
Up to 1 TB of data
Data pulled from database using JDBC River
Database is replicated asynchronously between DCs. Each DC will have
its own database instance to pull data.
Average latency between DCs is about several miliseconds
We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue. Here
is how we want to prevent it:

Set node.master: true only in 4 nodes in active DC
Set node.master: false in passive DC
This way we'll be sure that new cluster will not be created in passive
DC
Additionally we'd like to set discovery.zen.minimum_master_nodes: 3
(to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active
and active becomes passive). In our system it takes about 20 minutes and
this is the maximum length of our maintenance window. We were thinking of
shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · May 7, 2014, 1:23am

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's the
sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec sebastian.laskawiec@gmail.comwrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

4 ES instances per DC. Full cluster will have 8 instances.

Up to 1 TB of data

Data pulled from database using JDBC River

Database is replicated asynchronously between DCs. Each DC will have
its own database instance to pull data.

Average latency between DCs is about several miliseconds

We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue. Here
is how we want to prevent it:

Set node.master: true only in 4 nodes in active DC

Set node.master: false in passive DC

This way we'll be sure that new cluster will not be created in
passive DC

Additionally we'd like to set discovery.zen.minimum_master_nodes: 3
(to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active
and active becomes passive). In our system it takes about 20 minutes and
this is the maximum length of our maintenance window. We were thinking of
shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b94kkDgY5ehdwSvPkA4TaZ9QPvds%3DZHsJ%2B5DFX1_e3xQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sebastian_Laskawiec · May 10, 2014, 8:05am

Thanks for the answer! We've been talking with several other teams in our
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's the
sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec <sebastian...@gmail.com<javascript:>

wrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

4 ES instances per DC. Full cluster will have 8 instances.

Up to 1 TB of data

Data pulled from database using JDBC River

Database is replicated asynchronously between DCs. Each DC will
have its own database instance to pull data.

Average latency between DCs is about several miliseconds

We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue.
Here is how we want to prevent it:

Set node.master: true only in 4 nodes in active DC

Set node.master: false in passive DC

This way we'll be sure that new cluster will not be created in
passive DC

Additionally we'd like to set discovery.zen.minimum_master_nodes: 3
(to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active
and active becomes passive). In our system it takes about 20 minutes and
this is the maximum length of our maintenance window. We were thinking of
shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71a8db73-40bc-431d-bb9a-b581f510cf03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dkjhanitt · May 12, 2014, 6:02pm

Having a separate cluster is definitely a better way to go. OR, you can
control the shard, replica placement so that they are always placed in the
same DC. In this way, you can avoid interDC issues still having a single
cluster. I have the similar issue and I am looking at it as one of the
alternative.

On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

Thanks for the answer! We've been talking with several other teams in our
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's
the sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

4 ES instances per DC. Full cluster will have 8 instances.

Up to 1 TB of data

Data pulled from database using JDBC River

Database is replicated asynchronously between DCs. Each DC will
have its own database instance to pull data.

Average latency between DCs is about several miliseconds

We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue.
Here is how we want to prevent it:

Set node.master: true only in 4 nodes in active DC

Set node.master: false in passive DC

This way we'll be sure that new cluster will not be created in
passive DC

Additionally we'd like to set discovery.zen.minimum_master_nodes:
3 (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes active
and active becomes passive). In our system it takes about 20 minutes and
this is the maximum length of our maintenance window. We were thinking of
shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amit_Soni · May 14, 2014, 6:50am

I am just wondering whether Elasticsearch team has any plans to add
features for multi-data center deployment (active-active)?

-Amit.

On Mon, May 12, 2014 at 11:02 AM, Deepak Jha dkjhanitt@gmail.com wrote:

Having a separate cluster is definitely a better way to go. OR, you can
control the shard, replica placement so that they are always placed in the
same DC. In this way, you can avoid interDC issues still having a single
cluster. I have the similar issue and I am looking at it as one of the
alternative.

On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

Thanks for the answer! We've been talking with several other teams in our
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's
the sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

4 ES instances per DC. Full cluster will have 8 instances.

Up to 1 TB of data

Data pulled from database using JDBC River

Database is replicated asynchronously between DCs. Each DC will
have its own database instance to pull data.

Average latency between DCs is about several miliseconds

We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue.
Here is how we want to prevent it:

Set node.master: true only in 4 nodes in active DC

Set node.master: false in passive DC

This way we'll be sure that new cluster will not be created in
passive DC

Additionally we'd like to set discovery.zen.minimum_master_nodes:
3 (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes
active and active becomes passive). In our system it takes about 20 minutes
and this is the maximum length of our maintenance window. We were thinking
of shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5875ae02-0cdd-4ce7-bce0-18e01bf0877a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAOGaQJU_Mb2Puk9kc0KcEwZeaQj2XaFdCrUCuVMMa%2BWt_289A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sebastian_Laskawiec · May 15, 2014, 7:23am

We are still thinking about production configuration and here is a short
list of single/separate cluster's advantages and disadvantages...

Single cluster:

(+) If you have single cluster - you perform single query to the
database. In case of having cluster per DC - each cluster needs to query DB
separately
(+) Data consistency - in the matter of fact this is achieved by
single query to the DB
(+) You can introduce new DC easily
(+) True active-active configuration
(-) Split brain and pretty complicated configuration (to avoid split
brain in case when DC link is down)
(-) node.master setting can not be changed in runtime (take a look at
my first post and split brain solution)
(-) In case of a disaster we need to operate on single DC. If you use
single cluster per 2 DCs you can't really tell if a single DC is strong
enough to handle query and indexing load
(-) In pessimistic scenario data travels through WAN 2 times (first
time - database replication, second time - ES replication)
(-) You can't really tell which node will respond to the query. Let's
assume that you have full index in each DC (force awareness option). ES
might decide to gather results from the remote DC and not from the local
one. This way you need to add WAN latency into your query time.
(-) You need to turn off whole cluster or perform cycle restarts
during upgrade

Separate cluster per DC:

(+) No Split brain
(+) You can tell precisely when you are out of resources to handle
load in ES cluster in each DC
(+) You can experiment with different settings on production. If
something goes wrong - just switch clients to standby DC.
(+) Full failover - in case of any problems - just switch to the other
DC
(+) Upgrades are easy and you have no down time (upgrade first DC,
stabilize it, test it, and then to the same to the other DC)
(+) Since these are 2 separate clusters you can avoid data traveling
through WAN during queries. Each DC queries nodes locally.
(-) It is not a full active-active configuration. It's more like an
active-standby configuration
(-) Data inconsistency might occur (different results when queried
local and remote DC)
(-) Each DC will query DB separately. This will generate additional
load to the DB

Right now we think we should go for 2 separate clusters. DB load is a thing
which worries me the most (we have really complicated query with a lot of
left joins). However we think that in our case having to separate DC have
more advantages then disadvantages.

If you have some more arguments or comments - please let us know

Regards
Sebastian

W dniu poniedziałek, 12 maja 2014 20:02:35 UTC+2 użytkownik Deepak Jha
napisał:

Having a separate cluster is definitely a better way to go. OR, you can
control the shard, replica placement so that they are always placed in the
same DC. In this way, you can avoid interDC issues still having a single
cluster. I have the similar issue and I am looking at it as one of the
alternative.

On Saturday, May 10, 2014 1:05:08 AM UTC-7, Sebastian Łaskawiec wrote:

Thanks for the answer! We've been talking with several other teams in our
company and it looks like this is the most recommended and stable setup.

Regards
Sebastian

W dniu środa, 7 maja 2014 03:23:43 UTC+2 użytkownik Mark Walkom napisał:

Go the latter method and have two clusters, ES can be very sensitive to
network latency and you'll likely end up with more problems than it is
worth.
Given you already have the data source of truth being replicated, it's
the sanest option to just read that locally.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 23:51, Sebastian Łaskawiec sebastian...@gmail.com wrote:

Hi!

I'd like to ask for advice about deployment in multi DC scenario.

Currently we operate on 2 Data Centers in active/standby mode. like to
opeIn case of ES we'd like to have different approach - we'drate in
active-active mode (we want to optimize our resources especially for
querying).
Here are some details about target configuration:

4 ES instances per DC. Full cluster will have 8 instances.

Up to 1 TB of data

Data pulled from database using JDBC River

Database is replicated asynchronously between DCs. Each DC will
have its own database instance to pull data.

Average latency between DCs is about several miliseconds

We need to operate when passive DC is down

We know that multi DC configuration might end with Split Brain issue.
Here is how we want to prevent it:

Set node.master: true only in 4 nodes in active DC

Set node.master: false in passive DC

This way we'll be sure that new cluster will not be created in
passive DC

Additionally we'd like to set discovery.zen.minimum_master_nodes:
3 (to avoid Split Brain in active DC)

Additionally there is problem with switchover (passive DC becomes
active and active becomes passive). In our system it takes about 20 minutes
and this is the maximum length of our maintenance window. We were thinking
of shutting down whole ES cluster and switch node.master setting in
configuration files (as far as I know this settings can not be changed via
REST api). Then we'd need to start whole cluster.

So my question is: is it better to have one big ES cluster operating on
both DCs or should we change our approach and create 2 separate clusters
(and rely on database replication)? I'd be grateful for advice.

Regards
Sebastian

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6be53754-63fd-4202-b940-750a3e0c1a8f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2bf8598c-463d-4216-b964-068ec26b97fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch cluster resiliency and availability Elasticsearch	13	321	January 30, 2024
Need advice for my small ElasticSearch cluster Elasticsearch	3	196	November 29, 2022
ElasticSearch cluster for DC failure tolerance Elasticsearch	3	404	November 12, 2020
Multi-datacenter deployments Elasticsearch	5	1528	July 6, 2017
Cluster allocation awareness - opposite Elasticsearch	4	371	July 6, 2017

Multi DC cluster or separate cluster per DC?

Related topics