New feature request? Automatic index scaling using aliases without over-allocations

Hi,

I am thinking about index scaling strategy via aliases but do not think it
is currently fully supported by ES in automatic way in all aspects. May be
this can be a reasonable use case and it leads to a new feature requests.
Let me explain:

Imagine a situation where you know that you need to create an index with
more shards (20-40, or even more...). The primary reason is not that your
index will be huge, but because you have limited resources per node (RAM,
disk size, ... etc). So you need your index to be distributed over "flat"
machines. Getting more resource per node is impossible, but getting a lot
of nodes is much easier. So to spread indices across such cluster, you
would need to over-allocate in advance. Or use aliases (I will get to this
point later).

Now, over-allocation is not always a good solution, especially, if you
(again) have limited resources, like available file descriptors. Because
over-allocation means that when you create the index, all the shards are in
the beginning allocated to a single node, though they might be very quickly
re-allocated to different nodes, the resources peak associated with a
single over-allocated index creation can be above available limits. The
same apply to the situation when your cluster shrinks, more index shards
are moved to a single node. In "normal" situation, this may not be a bit
deal but if you do not have enough resources per node then I think it is
much better to hit red health cluster status over Java exceptions (like too
many files open.. etc).

The other option can be using index aliases. Now the problem with aliases
is they have to be managed by client. I think it might make sense to think
about different kind of aliases that could work in more automated way. Let
me think out loud:

  • I could say in the index template, that if index meets specific condition
    (for example based on regex match of index name) it can be assigned an
    alias automatically (this should be perfectly doable, no?)
  • When indexing against alias name (not possible across more indices now,
    right?) then if indices share the same type, it could perfectly index into
    that type in one of the matching indices. Why not? (Probably the
    child/parent needs to take this into account - still doable?)
  • Is there any problem with deletes?
  • Can I setup the upper limit of how many shards can be allocated per
    single node?

With these it would be possible to start a small index with just few
shards. And grow and shrink it as needed without the risk of hitting system
level limitations, of course I could hit red cluster health status, but
that is a different story and perfectly acceptable.

Thoughts?

Regards,
Lukas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  • Can I setup the upper limit of how many shards can be allocated per
    single node?

Seems this is already implemented:
http://www.elasticsearch.org/guide/reference/index-modules/allocation/(Total
Shards per Node)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.