Question: shard awareness allocation and shard allocation filtering

Hello,

My question is related to cooperation between shard awareness allocation
and shard allocation filtering.

We have two kinds of nodes: those with ssds (used for indexing and search
recent data), those with large spinning disks (used for archiving old
indices).

I'd like to setup a mechanism to move old indices from ssds to spinning
disks.

The first solution uses reroute command in cluster api. However it feels
unnatural since you have to do it shard by shard and decide the target node.

What I want to achieve is the following:

  1. stick recent indices (the current one being written) to ssds. They have
    2 copies.
  2. at some point (disk on ssds is above 65%), one copy is moved to larger
    boxes (1 copy is still on ssd to help search, 1 copy on large box)
  3. when disk is scarce on ssd boxes (90%), we simply drop the copy present
    on ssd. Since we don't care that much of old data having only one copy is
    not an issue.

I have tried to implement this with shard awareness allocation and
allocation filtering but it does not seem to work as expected.

Nodes have flavor attribute depending on their hardware (ssd or iodisk
).
Cluster is using shard awareness based on flavor attribute (cluster.routing.allocation.awareness.attributes:
flavor
).

  1. My index template has routing.allocation.require: ssd to impose two
    have all copies on ssds first.
  2. At some point, I drop the requirement (effectively *routing.allocation.require:
    **). I expect flavor awareness to move one copy to large (iodisk) boxes.
  3. At a later point, I'll set number_of_replicas to 0 and change
    routing.allocation.require to iodisk to drop the shard copy on ssds

Sadly allocation filtering and shard awareness do not seem to cooperate
well :
when an new index is created, one copy goes to ssds and the other is not
allocated anywhere (index stays in yellow state).

Using curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"logger.cluster.routing.allocation":"trace"}}
,
I have observed what happen when a new index is created.

[2014-10-16 06:53:19,462][TRACE][cluster.routing.allocation.decider]

[bungeearchive01-par.storage.criteo.preprod] Can not allocate
[[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]] on node
[qK34VLdhTferCQs2oNJOyg] due to [SameShardAllocationDecider]
[2014-10-16 06:53:19,463][TRACE][cluster.routing.allocation.decider]
[bungeearchive01-par.storage.criteo.preprod] Can not allocate
[[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]] on node
[gE7OTgevSUuoj44RozxK0Q] due to [AwarenessAllocationDecider]
[2014-10-16 06:53:19,463][TRACE][cluster.routing.allocation.decider]
[bungeearchive01-par.storage.criteo.preprod] Can not allocate
[[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]] on node
[Y2k9qXfsTx6X2iQTxg9RBQ] due to [AwarenessAllocationDecider]
[2014-10-16 06:53:19,463][TRACE][cluster.routing.allocation.decider]
[bungeearchive01-par.storage.criteo.preprod] Can not allocate
[[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]] on node
[FwWc2XPPRWuje2KH6AlDEQ] due to [FilterAllocationDecider]
[2014-10-16 06:53:19,492][TRACE][cluster.routing.allocation.allocator]
[bungeearchive01-par.storage.criteo.preprod] No Node found to assign shard
[[2014-10-16.01][3], node[null], [R], s[UNASSIGNED]]

This transcript shows that

  • shard 3 primary replica is on node qK34VLdhTferCQs2oNJOyg (flavor:ssd)
    which prevent its copy to placed there
  • it cannot be placed on gE7OTgevSUuoj44RozxK0Q (ssd as well) because it
    tries to maximizes dispersion accross flavors
  • it cannot be placed on Y2k9qXfsTx6X2iQTxg9RBQ for the same reason
  • it cannot be placed on FwWc2XPPRWuje2KH6AlDEQ (flavor: iodisk) because of
    the filter

Questions:

  • am I doing it wrong?
  • should I stick with a set of reroute command?
  • are awareness and filtering supposed to cooperate?

Any help will be appreciated

--
Grégoire Seux

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/61665ede-f772-43c5-b159-53bf9353bbdf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Thu, Oct 16, 2014 at 11:42 AM, Grégoire Seux
kamaradclimber@gmail.com wrote:

  • are awareness and filtering supposed to cooperate?

A quick look at the code confirm that allocation deciders are fully orthogonal.
Should I open a github issue to discuss adding support for cooperating
deciders ?

--
Grégoire Seux

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE3ghBu%2BiCENFsMBWrE0UwiKOikcKyRox4eGBChAFVK78%3DY-OA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Grégoire

A couple of comments:

  1. at some point (disk on ssds is above 65%), one copy is moved to larger
    boxes (1 copy is still on ssd to help search, 1 copy on large box)

Allocation awareness causes elasticsearch to spread the shards copies
across the different values of the attribute. However, it also changes the
search behavior in the sense that it tries to execute searches on nodes
that have the same attributes as the one that initially got the search. In
your case it means that if an ssd node got the search, it will run on SSD
otherwise it will on iodisk. I'm not sure this is what you want.

  1. At some point, I drop the requirement (effectively *routing.allocation.require:
    **). I expect flavor awareness to move one copy to large (iodisk) boxes.

ES tries the balance shards from the cluster perspective. It gives some
weight to spreading up the shards of an index but this just one
parameter.In your cases I suspect you have way more shards on the iodisk
nodes than on the ssds, which means that balancing will try to move shards
from iodisks to ssds if it can but not the other way around (as you expect).

are awareness and filtering supposed to cooperate?

I think they should but I'm not sure it will achieve what you want to do -
see comment above. That said, I can confirm that shard allocation awareness
and filtering on the same attribute may be in each other way. I would
suggest you open an issue on github indicating that when shard allocation
awareness is causing unassigned shards if one of the attribute values is
blocked by an allocation filter (doesn't matter which filter is being
used). You would expect it to behave the same as if the nodes were down (in
which case the shards will be assigned). Try to give a concise reproduction
using two different attributes for filtering and awareness.

Cheers,
Boaz

On Saturday, October 18, 2014 8:37:29 PM UTC+2, Grégoire Seux wrote:

On Thu, Oct 16, 2014 at 11:42 AM, Grégoire Seux
kamaradclimber@gmail.com wrote:

  • are awareness and filtering supposed to cooperate?

A quick look at the code confirm that allocation deciders are fully
orthogonal.
Should I open a github issue to discuss adding support for cooperating
deciders ?

--
Grégoire Seux

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6cc56d4-f2aa-403c-a46e-54c34b3a41a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Boaz,

Thanks for your answer.

In your case it means that if an ssd node got the search, it will run
on SSD otherwise it will on iodisk. I'm not sure this is what you
want.

This is the effect I am looking for.
During the period where I have a copy on ssds and on iodisks, I prefer
to query the fastest ones.

ES tries the balance shards from the cluster perspective. It gives some
weight to spreading up the shards of an index but this just one
parameter.In your cases I suspect you have way more shards on the iodisk
nodes than on the ssds, which means that balancing will try to move shards
from iodisks to ssds if it can but not the other way around (as you expect).

You're right, I'd like to have way more shards on iodisk indeed.
Reading
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/AllocationDecidersModule.java#L66,
I don't see any filter that will prevent allocation from ssd to iodisk.
I understand that ES will see the situation as unbalanced and will try
to rebalance it but I expect allocation filter to prevent those backward
reallocation.

I would suggest you open an issue on github indicating that when shard
allocation awareness is causing unassigned shards if one of the
attribute values is blocked by an allocation filter

Done: rack awareness allocation and allocation filtering lead to unassigned shards · Issue #8178 · elastic/elasticsearch · GitHub

--
Grégoire

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20141021132937.GB1762%40criteo-scalasto.criteo.prod.
For more options, visit https://groups.google.com/d/optout.