Finding optimum number of shards for custom routing


#1

Hi,
I am trying to find optimum number of shards for my data.
I have custom routing value in my application.

I know the formula: hash(routing) % num of primary shards.

Question:
Can anyone point me to the hashing function above ? Is there a utility or source code which i can use ?


#2

I found it out myself from the ES code.

Anyone who is looking for the hash function please use this from ES codebase:

Math.floorMod(Murmur3HashFunction.hash(_routing), numberOfShards)

This output a number which is a shard number to which your _routing value will be allocated.

I hope this saves someones day, especially if they don't get replies on this forum.


(Christian Dahlqvist) #3

What are you going to use this for?


#4

To decide the number of primary shards for my use case. I am using custom _routing.
I need to choose a number of primary shards so that all my content is evenly distributed across shards.

My current issue: when i am indexing data using multiple threads and all my data which has same _routing, is getting shoved into the same shard, i am loosing data.


(Christian Dahlqvist) #5

What are you using routing for. Is it based on a low cardinality field?


#6

I have data divided into several groups. Each group has a two character identifier. My _routing is not based on any field in my document.
I expect each group to be stored on a single shard.

Now my two character identifier serves as my "_routing" while indexing.
At search time i know my identifier, so it is easy to search.

I expect my groups to be evenly divided across my shards. Hence i need to decided optimum number of primary shards which will be useful while indexing and also beneficial in searching.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.