Allocation awareness and node replacement


(Otis Gospodnetić) #1

Hi,

This is a question about what ES does or doesn't do with shards/
replicas when:

  1. a box/node with specific shard/replica allocation filtering goes
    down, and
  2. a new box is added to the cluster in order to replace the downed
    box from 1)

Consider, for example, a cluster of 4 boxes:

  • box #1 hosts shard #1
  • box #2 hosts shard #2
  • box #3 hosts replica of shard #1
  • box #4 hosts replica of shard #2

Assume allocation awareness was used to "tag" the above boxes/nodes in
order to maintain full control over where ES should place shards and
replicas for our Foo index.

Questions:
A) What happens when box #4 dies?
Does ES automatically replicate under-replicated shards?

I think the answer is No, but in case ES does automatically re-

replicate, where would it place a new replica of shard #2 in this
example?

B) Same questions as above, but for box #2 going down? (In other
words, the box that dies is the one with the original shard, not its
replica)

I think the answer should be the same as for A), but want to

double-check.

C) If box #4 dies and ES doesn't automatically allocate replica of
shard #2 (which was on that box) to another box, what should an admin
do in order to preserve the same shard & replica allocation as before?
In other words, if he wants to remain in full control of
allocation and maintain order and control over what is placed where,
what are the steps he should follow?

D) Related to C - if an admin adds a new box to the cluster - a box
meant as a replacement for box #4, but not yet tagged with allocation
awareness stuff - will ES automatically "grab" it and start putting
new shards on it?
In other words, if the admin just adds a "blank" box to the
cluster, will he lose control over what ES will put there because ES
could start using it right away?

E) Related to D - what should the admin do when adding a new box #4
replacement to the cluster?
Does he have to make sure he applies allocation awareness stuff to
this new box before it's added to the cluster?
Does he have to make some explicit calls to ES to tell it "Hey, I
made this new box #4 available for you and it's allocation aware - now
go and create a replica of shard #2 on it"?

Sorry for all these questions, but hopefully others will find this
useful, too. Thanks!

Otis

Search Analytics - http://sematext.com/search-analytics/index.html


(Shay Banon) #2

Can't answer the question without knowing which values each box / node has for the awareness attribute, and if its forced awareness or not. And you confuse in your mail between allocation awareness and allocation filtering (two different things, re-read this: http://www.elasticsearch.org/guide/reference/modules/cluster.html.

On Monday, February 13, 2012 at 8:21 PM, Otis Gospodnetic wrote:

Hi,

This is a question about what ES does or doesn't do with shards/
replicas when:

  1. a box/node with specific shard/replica allocation filtering goes
    down, and
  2. a new box is added to the cluster in order to replace the downed
    box from 1)

Consider, for example, a cluster of 4 boxes:

  • box #1 hosts shard #1
  • box #2 hosts shard #2
  • box #3 hosts replica of shard #1
  • box #4 hosts replica of shard #2

Assume allocation awareness was used to "tag" the above boxes/nodes in
order to maintain full control over where ES should place shards and
replicas for our Foo index.

Questions:
A) What happens when box #4 dies?
Does ES automatically replicate under-replicated shards?

I think the answer is No, but in case ES does automatically re-
replicate, where would it place a new replica of shard #2 in this
example?

B) Same questions as above, but for box #2 going down? (In other
words, the box that dies is the one with the original shard, not its
replica)

I think the answer should be the same as for A), but want to
double-check.

C) If box #4 dies and ES doesn't automatically allocate replica of
shard #2 (which was on that box) to another box, what should an admin
do in order to preserve the same shard & replica allocation as before?
In other words, if he wants to remain in full control of
allocation and maintain order and control over what is placed where,
what are the steps he should follow?

D) Related to C - if an admin adds a new box to the cluster - a box
meant as a replacement for box #4, but not yet tagged with allocation
awareness stuff - will ES automatically "grab" it and start putting
new shards on it?
In other words, if the admin just adds a "blank" box to the
cluster, will he lose control over what ES will put there because ES
could start using it right away?

E) Related to D - what should the admin do when adding a new box #4
replacement to the cluster?
Does he have to make sure he applies allocation awareness stuff to
this new box before it's added to the cluster?
Does he have to make some explicit calls to ES to tell it "Hey, I
made this new box #4 available for you and it's allocation aware - now
go and create a replica of shard #2 on it"?

Sorry for all these questions, but hopefully others will find this
useful, too. Thanks!

Otis

Search Analytics - http://sematext.com/search-analytics/index.html


(Otis Gospodnetić) #3

Hi,

On Feb 13, 1:35 pm, Shay Banon kim...@gmail.com wrote:

Can't answer the question without knowing which values each box / node has for the awareness attribute, and if its forced awareness or not.

What's the difference between forced awareness and just allocation
awareness? Isn't plain allocation awareness really forced, because if
I didn't want to force awareness, I would not bother with allocation
awareness in the first place and would just let ES allocate however it
wanted?
Or maybe "forced" in the naming is a bit misleading? ...because I can
see behavioural differences between what's described in "Shard
Allocation Awareness" section and what's in "Forced Awareness" sub-
section.

So, maybe the Q that would clarify the difference is:
What can happen with just plain allocation awareness that cannot
happen with forced allocation?
I do see "...we would like never to have more replicas then needed
allocated on a specific group of nodes with the same awareness
attribute value." there, but A) I'm having a hard time creating a
practical mental picture of this scenario, I think primarily because
I'm not sure how to interpret the "more replicas than needed" part
("than needed"...by who?), and B) am wondering if this is the only
difference?

And you confuse in your mail between allocation awareness and allocation filtering (two different things, re-read this:
http://www.elasticsearch.org/guide/reference/modules/cluster.html.

I've re-read this now, but I can't say it's crystal clear :frowning:

There is a section titled "Shard Allocation Filtering", but it sounds
like it's more about index allocation, not shard allocation.
But maybe it's really about shard allocation in a sense that when you
create an index, its shards will be placed only on nodes with tags
that match "index.routing.allocation.include.tag" value used when the
index was created?
In either case, it sounds like "Allocation Filtering" is about telling
ES which nodes to place index shards (it says nothing about replicas?)
when an index is created. Is this correct?

Compared to "Shard Allocation Filtering", the "Allocation Awareness"
functionality seems to also involve replica allocation, with the only
difference between what's described in "Forced Allocation
Awareness" (FAA) sub-section and what's in "Shard Allocation
Awareness" (SAA) that I could spot being that in FAA replicas will not
be created/allocated until nodes with a new attribute value are added
to the cluster, while in SAA they will be created right away, and will
automatically move to other nodes wen/if additional nodes with a new
awareness attribute value are created. Is this correct? Is this the
main or only difference?

In the end, I'm still not sure which of the above to use for my
needs. Here is what I need:

  • A new index created each day - imagine this as a grid of nodes where
    each row of nodes hosts shards and replicas for the index for the
    given day
  • Control over which nodes should host each day's index - on each day
    I want to create a new index and put it in the new row of nodes in
    this grid
  • Control over which nodes in the row should host shards and which
    replicas - e.g. first 2 nodes in the row host 1 shard each, the next 4
    nodes host 1 replica each for a total of 5 shard replicas per row
  • Ability to put more than 1 index on the same set of nodes - at some
    point in time go back to the row holding oldest indices and create an
    index for the new day on them

It feels like I should use just SAF (Shard Allocation Filtering), but
that section doesn't mention replicas...

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html

On Monday, February 13, 2012 at 8:21 PM, Otis Gospodnetic wrote:

Hi,

This is a question about what ES does or doesn't do with shards/
replicas when:

  1. a box/node with specific shard/replica allocation filtering goes
    down, and
  2. a new box is added to the cluster in order to replace the downed
    box from 1)

Consider, for example, a cluster of 4 boxes:

  • box #1 hosts shard #1
  • box #2 hosts shard #2
  • box #3 hosts replica of shard #1
  • box #4 hosts replica of shard #2

Assume allocation awareness was used to "tag" the above boxes/nodes in
order to maintain full control over where ES should place shards and
replicas for our Foo index.

Questions:
A) What happens when box #4 dies?
Does ES automatically replicate under-replicated shards?

I think the answer is No, but in case ES does automatically re-
replicate, where would it place a new replica of shard #2 in this
example?

B) Same questions as above, but for box #2 going down? (In other
words, the box that dies is the one with the original shard, not its
replica)

I think the answer should be the same as for A), but want to
double-check.

C) If box #4 dies and ES doesn't automatically allocate replica of
shard #2 (which was on that box) to another box, what should an admin
do in order to preserve the same shard & replica allocation as before?
In other words, if he wants to remain in full control of
allocation and maintain order and control over what is placed where,
what are the steps he should follow?

D) Related to C - if an admin adds a new box to the cluster - a box
meant as a replacement for box #4, but not yet tagged with allocation
awareness stuff - will ES automatically "grab" it and start putting
new shards on it?
In other words, if the admin just adds a "blank" box to the
cluster, will he lose control over what ES will put there because ES
could start using it right away?

E) Related to D - what should the admin do when adding a new box #4
replacement to the cluster?
Does he have to make sure he applies allocation awareness stuff to
this new box before it's added to the cluster?
Does he have to make some explicit calls to ES to tell it "Hey, I
made this new box #4 available for you and it's allocation aware - now
go and create a replica of shard #2 on it"?

Sorry for all these questions, but hopefully others will find this
useful, too. Thanks!

Otis

Search Analytics -http://sematext.com/search-analytics/index.html


(Shay Banon) #4

Regarding shard allocation awareness:

The regular non forced one simply means that the allocation will aim, if possible, to spread a shard and its replicas across different group of nodes (divided by the node awareness attribute values). But, it will still allocate all the shards and the replicas on a cluster with nodes having a single awareness attribute value. Take rack_id for example, a single rack might have enough servers to have all the shards and the replicas allocated on it if its the single rack in the cluster, but if you have two racks, you would like to have the shard and the replicas spread across them.

Forced awareness means that the number of copies of a shard will not be over allocated over the average number of copies expected per group of nodes. Take an AWS availability zone as an example, and you create an index with 3 replicas (4 copies total per shard), if availability zone goes down, you don't want to have the 2 copies of the shard to be allocated on the remaining availability zone. That is also the reason why forced awareness requires listing the values allowed for it (so the max number of copies of a shard can be calculated per group of nodes).

On Monday, February 13, 2012 at 10:54 PM, Otis Gospodnetic wrote:

Hi,

On Feb 13, 1:35 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Can't answer the question without knowing which values each box / node has for the awareness attribute, and if its forced awareness or not.

What's the difference between forced awareness and just allocation
awareness? Isn't plain allocation awareness really forced, because if
I didn't want to force awareness, I would not bother with allocation
awareness in the first place and would just let ES allocate however it
wanted?
Or maybe "forced" in the naming is a bit misleading? ...because I can
see behavioural differences between what's described in "Shard
Allocation Awareness" section and what's in "Forced Awareness" sub-
section.

So, maybe the Q that would clarify the difference is:
What can happen with just plain allocation awareness that cannot
happen with forced allocation?
I do see "...we would like never to have more replicas then needed
allocated on a specific group of nodes with the same awareness
attribute value." there, but A) I'm having a hard time creating a
practical mental picture of this scenario, I think primarily because
I'm not sure how to interpret the "more replicas than needed" part
("than needed"...by who?), and B) am wondering if this is the only
difference?

And you confuse in your mail between allocation awareness and allocation filtering (two different things, re-read this:
http://www.elasticsearch.org/guide/reference/modules/cluster.html.

I've re-read this now, but I can't say it's crystal clear :frowning:

There is a section titled "Shard Allocation Filtering", but it sounds
like it's more about index allocation, not shard allocation.
But maybe it's really about shard allocation in a sense that when you
create an index, its shards will be placed only on nodes with tags
that match "index.routing.allocation.include.tag" value used when the
index was created?
In either case, it sounds like "Allocation Filtering" is about telling
ES which nodes to place index shards (it says nothing about replicas?)
when an index is created. Is this correct?

Compared to "Shard Allocation Filtering", the "Allocation Awareness"
functionality seems to also involve replica allocation, with the only
difference between what's described in "Forced Allocation
Awareness" (FAA) sub-section and what's in "Shard Allocation
Awareness" (SAA) that I could spot being that in FAA replicas will not
be created/allocated until nodes with a new attribute value are added
to the cluster, while in SAA they will be created right away, and will
automatically move to other nodes wen/if additional nodes with a new
awareness attribute value are created. Is this correct? Is this the
main or only difference?

In the end, I'm still not sure which of the above to use for my
needs. Here is what I need:

  • A new index created each day - imagine this as a grid of nodes where
    each row of nodes hosts shards and replicas for the index for the
    given day
  • Control over which nodes should host each day's index - on each day
    I want to create a new index and put it in the new row of nodes in
    this grid
  • Control over which nodes in the row should host shards and which
    replicas - e.g. first 2 nodes in the row host 1 shard each, the next 4
    nodes host 1 replica each for a total of 5 shard replicas per row
  • Ability to put more than 1 index on the same set of nodes - at some
    point in time go back to the row holding oldest indices and create an
    index for the new day on them

It feels like I should use just SAF (Shard Allocation Filtering), but
that section doesn't mention replicas...

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html

On Monday, February 13, 2012 at 8:21 PM, Otis Gospodnetic wrote:

Hi,

This is a question about what ES does or doesn't do with shards/
replicas when:

  1. a box/node with specific shard/replica allocation filtering goes
    down, and
  2. a new box is added to the cluster in order to replace the downed
    box from 1)

Consider, for example, a cluster of 4 boxes:

  • box #1 hosts shard #1
  • box #2 hosts shard #2
  • box #3 hosts replica of shard #1
  • box #4 hosts replica of shard #2

Assume allocation awareness was used to "tag" the above boxes/nodes in
order to maintain full control over where ES should place shards and
replicas for our Foo index.

Questions:
A) What happens when box #4 dies?
Does ES automatically replicate under-replicated shards?

I think the answer is No, but in case ES does automatically re-
replicate, where would it place a new replica of shard #2 in this
example?

B) Same questions as above, but for box #2 going down? (In other
words, the box that dies is the one with the original shard, not its
replica)

I think the answer should be the same as for A), but want to
double-check.

C) If box #4 dies and ES doesn't automatically allocate replica of
shard #2 (which was on that box) to another box, what should an admin
do in order to preserve the same shard & replica allocation as before?
In other words, if he wants to remain in full control of
allocation and maintain order and control over what is placed where,
what are the steps he should follow?

D) Related to C - if an admin adds a new box to the cluster - a box
meant as a replacement for box #4, but not yet tagged with allocation
awareness stuff - will ES automatically "grab" it and start putting
new shards on it?
In other words, if the admin just adds a "blank" box to the
cluster, will he lose control over what ES will put there because ES
could start using it right away?

E) Related to D - what should the admin do when adding a new box #4
replacement to the cluster?
Does he have to make sure he applies allocation awareness stuff to
this new box before it's added to the cluster?
Does he have to make some explicit calls to ES to tell it "Hey, I
made this new box #4 available for you and it's allocation aware - now
go and create a replica of shard #2 on it"?

Sorry for all these questions, but hopefully others will find this
useful, too. Thanks!

Otis

Search Analytics -http://sematext.com/search-analytics/index.html


(system) #5