Hi, I have 5 unique routing keys "a","b","c","d","e" in my application.
My ES setup consists of 5 shards.
When i check in current state (ver 6.6.2). My shard allocation is like this:
a 4
b 2
c 0
d 3
e 4
Note that shard number 1 is unallocated. Shard number 4 is overallocated.
Query
Is there a way that i can ensure all my shards are distributed across 5 shards with one key each ?
I understand there is a murmur hash algorithm which is involved in allocation.
I am not using the default routing, while indexing i do specify eg: routing=a.
My problem is when i finish indexing for all routing keys i have. Shard 1 remains unallocated.
I need a way such that a,b,c,d,e, (having 5 shards in ES), documents for each routing key goes on separate shard.
I understand that it totally depends on the hash generate for the given keys.
My base problem:
In above example say routing=a and routing=e goes on same shard i.e. 4.
Assume a and e are types of documents, which have few fields in common. When i query documents with routing=a and on common fields, ES also checks all documents in routing=e lowering the performance extensively. Check this thread
Some issue which i see with shard allocation by routing
If i have 50shards and have 100 routing keys, it may happen that due to the nature of hash generated, some shards will remain unallocated for ever.
Suggestion:
There could be a routing key to shard number mapping in the settings section which will give this control to the user. Such that user will decide which shard the routing key maps to.
At the moment the routing key is hashed instead of the id. If you have large number of routing keys probability will give a reasonably even distribution. If you need more control and have few routing keys instead consider using multiple separate indices instead.
I already have 200 indexes, and the choosing of index based upon some factors is already manually handled in my code.
Each index, inside it is having unique types, which i call the routing keys, which i wanted to distribute equally across shards (my original question).
This is like Types (routing keys) inside Types (my indexes).
I wonder why the control of choosing shard when routing is given, is not handed to user. I think this would add more flexibility in ES and better control to users.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.