Replica allocation control across a cluster

Hello All,

I am trying to run an ES cluster across two data centers. Lets say DC1 and
DC2.

I have 6 nodes in total (N0 … N5) where (N0 … N2) belongs to DC1 and (N3…
N5) belong to DC2.

I plan to have 12 shards / index and 1 replica / shard.

My primary objective is to have the primary shard in one DC and their
replicas in the other DC (Master-Slave setup). In this way, I can achieve
some kind of Disaster Recovery if one DC goes down. Lets say DC1 goes down
and then DC2 has all the nodes and is fully functional.

My goals are the following,

  1. Control the allocation of indexes to one DC.
  2. Control the allocation of replicas to the other DC.

Goal 1 can be achieved using the shard allocation filtering as mentioned
here, http://www.elasticsearch.org/guide/reference/index-modules/allocation/ by
tagging the nodes to require DC1.

"index.routing.allocation.require.tag" : "DC1"

This way I can assure all the primary shards are in DC1.

However, I haven't been able to find any documentation on how I can control
the allocation of the replicas to the other DC (DC2 in this case.).

Do replicas too follow the shard allocation configuration? In this case,
will all the replicas belong to DC1 too? I did come across few posts
touching this issue, but solution seem unclear. With the new and better
versions of elastic search, I am wondering if this issue is now addressed.

Any input is appreciated.

Thank you,

Kunal

P.S. : I new to Elasticsearch and this is my first post in this community,
so please excuse me for any obvious mistakes and feel free to cite them.
Also, I wanted to thank the community for the great questions and solutions
already posted.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

After deeper dive into the docs, I found out that elasticsearch is capable
of achieving this through simple configuration.
Referencing the "Shard Allocation Awareness" section
at Elasticsearch Platform — Find real-time answers at scale | Elastic
I created the rack_id attribute on all the nodes. The value I assigned each
node was either "DC1" or "DC2".

For this test, I created a cluster with 4 nodes, N0 ... N3 (unicast, since
multicast doesn't work in our cross Datacenter setup)

I updated the elasticsearch.yml file with

node.rack_id: DC1 #N0 / N1 have value DC1 and N2 / N3 have value DC2
cluster.routing.allocation.awareness.attributes: rack_id

For all my primaries, their replicas were in different DCs.

For further testing I shut down N3, N2 to simulate the DC2 going down. At
this moment I still have all the data in DC1 and the cluster is 100%
functioning.
Further I shutdown N1 and all primaries are on N0 and replicas unassigned.
One after other I bring N1,N2,N3 back up again. At this moment, all my
primaries are in DC1 and replicas are now moved to DC2 (on N2 and N3)

My problem seems solved for now and I currently performing capacity testing.

Cheers,
Kunal

On Monday, May 6, 2013 8:19:42 AM UTC-4, Kunal Modi wrote:

Hello All,

I am trying to run an ES cluster across two data centers. Lets say DC1 and
DC2.

I have 6 nodes in total (N0 … N5) where (N0 … N2) belongs to DC1 and (N3…
N5) belong to DC2.

I plan to have 12 shards / index and 1 replica / shard.

My primary objective is to have the primary shard in one DC and their
replicas in the other DC (Master-Slave setup). In this way, I can achieve
some kind of Disaster Recovery if one DC goes down. Lets say DC1 goes down
and then DC2 has all the nodes and is fully functional.

My goals are the following,

  1. Control the allocation of indexes to one DC.
  2. Control the allocation of replicas to the other DC.

Goal 1 can be achieved using the shard allocation filtering as mentioned
here,
Elasticsearch Platform — Find real-time answers at scale | Elastic by
tagging the nodes to require DC1.

"index.routing.allocation.require.tag" : "DC1"

This way I can assure all the primary shards are in DC1.

However, I haven't been able to find any documentation on how I can
control the allocation of the replicas to the other DC (DC2 in this case.).

Do replicas too follow the shard allocation configuration? In this case,
will all the replicas belong to DC1 too? I did come across few posts
touching this issue, but solution seem unclear. With the new and better
versions of Elasticsearch, I am wondering if this issue is now addressed.

Any input is appreciated.

Thank you,

Kunal

P.S. : I new to Elasticsearch and this is my first post in this community,
so please excuse me for any obvious mistakes and feel free to cite them.
Also, I wanted to thank the community for the great questions and solutions
already posted.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Very interesting. Thanks!

One more question, though: If the network between the two data centers is
cut, and discovery.zen.minimum_master_nodes=2, is there a chance of a
split-brain situation in which each data center's pair of nodes vote
themselves the master?

On Wednesday, May 8, 2013 5:05:19 PM UTC-4, Kunal Modi wrote:

After deeper dive into the docs, I found out that elasticsearch is capable
of achieving this through simple configuration.
Referencing the "Shard Allocation Awareness" section at
Elasticsearch Platform — Find real-time answers at scale | Elastic
I created the rack_id attribute on all the nodes. The value I assigned
each node was either "DC1" or "DC2".

For this test, I created a cluster with 4 nodes, N0 ... N3 (unicast, since
multicast doesn't work in our cross Datacenter setup)

I updated the elasticsearch.yml file with

node.rack_id: DC1 #N0 / N1 have value DC1 and N2 / N3 have value DC2
cluster.routing.allocation.awareness.attributes: rack_id

For all my primaries, their replicas were in different DCs.

For further testing I shut down N3, N2 to simulate the DC2 going down. At
this moment I still have all the data in DC1 and the cluster is 100%
functioning.
Further I shutdown N1 and all primaries are on N0 and replicas unassigned.
One after other I bring N1,N2,N3 back up again. At this moment, all my
primaries are in DC1 and replicas are now moved to DC2 (on N2 and N3)

My problem seems solved for now and I currently performing capacity
testing.

Cheers,
Kunal

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Assume discovery.zen.minimum_master_nodes=1
Lets say the master belonged to DC1 and the link between the DCs is broken.
Each DC will now have its own master and functional independently.
N0,N1,N2 function as one cluster lets say N0 as master
N3,N4,N5 function as another cluster lets sat N3 as master

When the link is restored and all nodes see each other, I think it will
redistribute the shards and have only one master.

Having discovery.zen.minimum_master_nodes=2
can further complicate the matters with each DC having 2 masters which
means 4 masters in total.
Taking nodes OOS can be an issue in that case.

I will have to test this, but my hunch is that it may work this way.

Cheers,
Kunal

P.S. sorry for the delay.

On Thursday, May 9, 2013 3:10:52 PM UTC-4, InquiringMind wrote:

Very interesting. Thanks!

One more question, though: If the network between the two data centers is
cut, and discovery.zen.minimum_master_nodes=2, is there a chance of a
split-brain situation in which each data center's pair of nodes vote
themselves the master?

On Wednesday, May 8, 2013 5:05:19 PM UTC-4, Kunal Modi wrote:

After deeper dive into the docs, I found out that elasticsearch is
capable of achieving this through simple configuration.
Referencing the "Shard Allocation Awareness" section at
Elasticsearch Platform — Find real-time answers at scale | Elastic
I created the rack_id attribute on all the nodes. The value I assigned
each node was either "DC1" or "DC2".

For this test, I created a cluster with 4 nodes, N0 ... N3 (unicast,
since multicast doesn't work in our cross Datacenter setup)

I updated the elasticsearch.yml file with

node.rack_id: DC1 #N0 / N1 have value DC1 and N2 / N3 have value DC2
cluster.routing.allocation.awareness.attributes: rack_id

For all my primaries, their replicas were in different DCs.

For further testing I shut down N3, N2 to simulate the DC2 going down. At
this moment I still have all the data in DC1 and the cluster is 100%
functioning.
Further I shutdown N1 and all primaries are on N0 and replicas unassigned.
One after other I bring N1,N2,N3 back up again. At this moment, all my
primaries are in DC1 and replicas are now moved to DC2 (on N2 and N3)

My problem seems solved for now and I currently performing capacity
testing.

Cheers,
Kunal

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Have you tested it?

I'm about to start a similar deployment where DC1 is used regularly and DC2
is used when DC1 datacenter is down.

On Thursday, May 30, 2013 3:39:22 AM UTC+3, Kunal Modi wrote:

Assume discovery.zen.minimum_master_nodes=1
Lets say the master belonged to DC1 and the link between the DCs is
broken. Each DC will now have its own master and functional independently.
N0,N1,N2 function as one cluster lets say N0 as master
N3,N4,N5 function as another cluster lets sat N3 as master

When the link is restored and all nodes see each other, I think it will
redistribute the shards and have only one master.

Having discovery.zen.minimum_master_nodes=2
can further complicate the matters with each DC having 2 masters which
means 4 masters in total.
Taking nodes OOS can be an issue in that case.

I will have to test this, but my hunch is that it may work this way.

Cheers,
Kunal

P.S. sorry for the delay.

On Thursday, May 9, 2013 3:10:52 PM UTC-4, InquiringMind wrote:

Very interesting. Thanks!

One more question, though: If the network between the two data centers is
cut, and discovery.zen.minimum_master_nodes=2, is there a chance of a
split-brain situation in which each data center's pair of nodes vote
themselves the master?

On Wednesday, May 8, 2013 5:05:19 PM UTC-4, Kunal Modi wrote:

After deeper dive into the docs, I found out that elasticsearch is
capable of achieving this through simple configuration.
Referencing the "Shard Allocation Awareness" section at
Elasticsearch Platform — Find real-time answers at scale | Elastic
I created the rack_id attribute on all the nodes. The value I assigned
each node was either "DC1" or "DC2".

For this test, I created a cluster with 4 nodes, N0 ... N3 (unicast,
since multicast doesn't work in our cross Datacenter setup)

I updated the elasticsearch.yml file with

node.rack_id: DC1 #N0 / N1 have value DC1 and N2 / N3 have value DC2
cluster.routing.allocation.awareness.attributes: rack_id

For all my primaries, their replicas were in different DCs.

For further testing I shut down N3, N2 to simulate the DC2 going down.
At this moment I still have all the data in DC1 and the cluster is 100%
functioning.
Further I shutdown N1 and all primaries are on N0 and replicas
unassigned.
One after other I bring N1,N2,N3 back up again. At this moment, all my
primaries are in DC1 and replicas are now moved to DC2 (on N2 and N3)

My problem seems solved for now and I currently performing capacity
testing.

Cheers,
Kunal

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I have tested the "cluster.routing.allocation.awareness.attributes:
rack_id" feature and it works if one of the DCs go down.
I haven't tested "discovery.zen.minimum_**master_nodes=2" situation and
split brain problem.

Cheers,
Kunal

On Thu, May 30, 2013 at 6:19 AM, Rami Gabai ramigabai@gmail.com wrote:

Have you tested it?

I'm about to start a similar deployment where DC1 is used regularly and
DC2 is used when DC1 datacenter is down.

On Thursday, May 30, 2013 3:39:22 AM UTC+3, Kunal Modi wrote:

Assume discovery.zen.minimum_m**aster_nodes=1
Lets say the master belonged to DC1 and the link between the DCs is
broken. Each DC will now have its own master and functional independently.
N0,N1,N2 function as one cluster lets say N0 as master
N3,N4,N5 function as another cluster lets sat N3 as master

When the link is restored and all nodes see each other, I think it will
redistribute the shards and have only one master.

Having discovery.zen.minimum_m**aster_nodes=2
can further complicate the matters with each DC having 2 masters which
means 4 masters in total.
Taking nodes OOS can be an issue in that case.

I will have to test this, but my hunch is that it may work this way.

Cheers,
Kunal

P.S. sorry for the delay.

On Thursday, May 9, 2013 3:10:52 PM UTC-4, InquiringMind wrote:

Very interesting. Thanks!

One more question, though: If the network between the two data centers
is cut, and discovery.zen.minimum_**master_nodes=2, is there a chance
of a split-brain situation in which each data center's pair of nodes vote
themselves the master?

On Wednesday, May 8, 2013 5:05:19 PM UTC-4, Kunal Modi wrote:

After deeper dive into the docs, I found out that elasticsearch is
capable of achieving this through simple configuration.
Referencing the "Shard Allocation Awareness" section at
http://www.elasticsearch.**org/guide/reference/modules/**cluster/http://www.elasticsearch.org/guide/reference/modules/cluster/

I created the rack_id attribute on all the nodes. The value I assigned
each node was either "DC1" or "DC2".

For this test, I created a cluster with 4 nodes, N0 ... N3 (unicast,
since multicast doesn't work in our cross Datacenter setup)

I updated the elasticsearch.yml file with

node.rack_id: DC1 #N0 / N1 have value DC1 and N2 / N3 have value DC2
cluster.routing.allocation.**awareness.attributes: rack_id

For all my primaries, their replicas were in different DCs.

For further testing I shut down N3, N2 to simulate the DC2 going down.
At this moment I still have all the data in DC1 and the cluster is 100%
functioning.
Further I shutdown N1 and all primaries are on N0 and replicas
unassigned.
One after other I bring N1,N2,N3 back up again. At this moment, all my
primaries are in DC1 and replicas are now moved to DC2 (on N2 and N3)

My problem seems solved for now and I currently performing capacity
testing.

Cheers,
Kunal

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/utb6NzFE-fE/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks!

Did you test it with discovery.zen.minimum_**master_nodes=1 ?

I will set a full test in the coming weeks and report its results.

I'm planning to use 2 DC, one in Austin and one in Chicago, where DC1 is
working with 2 nodes and DC2 with 1 node.
DC2 is used only in case DC1 fails.

My main concern is what happens when both DCs are back and how the indices
are merged seamlessly.

On Thursday, May 30, 2013 3:25:08 PM UTC+3, Kunal Modi wrote:

I have tested the "cluster.routing.allocation.awareness.attributes:
rack_id" feature and it works if one of the DCs go down.
I haven't tested "discovery.zen.minimum_**master_nodes=2" situation and
split brain problem.

Cheers,
Kunal

On Thu, May 30, 2013 at 6:19 AM, Rami Gabai <rami...@gmail.com<javascript:>

wrote:

Have you tested it?

I'm about to start a similar deployment where DC1 is used regularly and
DC2 is used when DC1 datacenter is down.

On Thursday, May 30, 2013 3:39:22 AM UTC+3, Kunal Modi wrote:

Assume discovery.zen.minimum_m**aster_nodes=1
Lets say the master belonged to DC1 and the link between the DCs is
broken. Each DC will now have its own master and functional independently.
N0,N1,N2 function as one cluster lets say N0 as master
N3,N4,N5 function as another cluster lets sat N3 as master

When the link is restored and all nodes see each other, I think it will
redistribute the shards and have only one master.

Having discovery.zen.minimum_m**aster_nodes=2
can further complicate the matters with each DC having 2 masters which
means 4 masters in total.
Taking nodes OOS can be an issue in that case.

I will have to test this, but my hunch is that it may work this way.

Cheers,
Kunal

P.S. sorry for the delay.

On Thursday, May 9, 2013 3:10:52 PM UTC-4, InquiringMind wrote:

Very interesting. Thanks!

One more question, though: If the network between the two data centers
is cut, and discovery.zen.minimum_**master_nodes=2, is there a chance
of a split-brain situation in which each data center's pair of nodes vote
themselves the master?

On Wednesday, May 8, 2013 5:05:19 PM UTC-4, Kunal Modi wrote:

After deeper dive into the docs, I found out that elasticsearch is
capable of achieving this through simple configuration.
Referencing the "Shard Allocation Awareness" section at
http://www.elasticsearch.**org/guide/reference/modules/**cluster/http://www.elasticsearch.org/guide/reference/modules/cluster/

I created the rack_id attribute on all the nodes. The value I assigned
each node was either "DC1" or "DC2".

For this test, I created a cluster with 4 nodes, N0 ... N3 (unicast,
since multicast doesn't work in our cross Datacenter setup)

I updated the elasticsearch.yml file with

node.rack_id: DC1 #N0 / N1 have value DC1 and N2 / N3 have value DC2
cluster.routing.allocation.**awareness.attributes: rack_id

For all my primaries, their replicas were in different DCs.

For further testing I shut down N3, N2 to simulate the DC2 going down.
At this moment I still have all the data in DC1 and the cluster is 100%
functioning.
Further I shutdown N1 and all primaries are on N0 and replicas
unassigned.
One after other I bring N1,N2,N3 back up again. At this moment, all my
primaries are in DC1 and replicas are now moved to DC2 (on N2 and N3)

My problem seems solved for now and I currently performing capacity
testing.

Cheers,
Kunal

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/utb6NzFE-fE/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi. Just a question about your two geolocated DC-s (Austin and Chicago). How would you like to handle the latency and delay's negative affect to indexing and searching in this situation?
The only way what I found is to switch indexing into async mode in translog:

index.translog.durability = async
index.translog.sync_interval = 10s

In this case the replication (and primary shard writing) is detached from transactional logging and it can fall behind around 10sec. Searching performance isn't important because with rack_id option all searching performed in addressed "DC" (rack_id). However async translogging are very dangerous in hardware failure situation. (documents may be lost)

Do you know any other way to handle the lag between two DCs?