Finding optimum number of shards for custom routing

Hi,
I am trying to find optimum number of shards for my data.
I have custom routing value in my application.

I know the formula: hash(routing) % num of primary shards.

Question:
Can anyone point me to the hashing function above ? Is there a utility or source code which i can use ?

I found it out myself from the ES code.

Anyone who is looking for the hash function please use this from ES codebase:

Math.floorMod(Murmur3HashFunction.hash(_routing), numberOfShards)

This output a number which is a shard number to which your _routing value will be allocated.

I hope this saves someones day, especially if they don't get replies on this forum.

2 Likes

What are you going to use this for?

To decide the number of primary shards for my use case. I am using custom _routing.
I need to choose a number of primary shards so that all my content is evenly distributed across shards.

My current issue: when i am indexing data using multiple threads and all my data which has same _routing, is getting shoved into the same shard, i am loosing data.

What are you using routing for. Is it based on a low cardinality field?

I have data divided into several groups. Each group has a two character identifier. My _routing is not based on any field in my document.
I expect each group to be stored on a single shard.

Now my two character identifier serves as my "_routing" while indexing.
At search time i know my identifier, so it is easy to search.

I expect my groups to be evenly divided across my shards. Hence i need to decided optimum number of primary shards which will be useful while indexing and also beneficial in searching.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.