Finding optimum number of shards for custom routing

pokaleshrey · July 11, 2018, 3:11pm

Hi,
I am trying to find optimum number of shards for my data.
I have custom routing value in my application.

I know the formula: hash(routing) % num of primary shards.

Question:
Can anyone point me to the hashing function above ? Is there a utility or source code which i can use ?

pokaleshrey · July 12, 2018, 5:35am

I found it out myself from the ES code.

Anyone who is looking for the hash function please use this from ES codebase:

Math.floorMod(Murmur3HashFunction.hash(_routing), numberOfShards)

This output a number which is a shard number to which your _routing value will be allocated.

I hope this saves someones day, especially if they don't get replies on this forum.

Christian_Dahlqvist · July 12, 2018, 5:57am

What are you going to use this for?

pokaleshrey · July 12, 2018, 6:24am

To decide the number of primary shards for my use case. I am using custom _routing.
I need to choose a number of primary shards so that all my content is evenly distributed across shards.

My current issue: when i am indexing data using multiple threads and all my data which has same _routing, is getting shoved into the same shard, i am loosing data.

Christian_Dahlqvist · July 12, 2018, 7:08am

What are you using routing for. Is it based on a low cardinality field?

pokaleshrey · July 12, 2018, 7:16am

I have data divided into several groups. Each group has a two character identifier. My _routing is not based on any field in my document.
I expect each group to be stored on a single shard.

Now my two character identifier serves as my "_routing" while indexing.
At search time i know my identifier, so it is easy to search.

I expect my groups to be evenly divided across my shards. Hence i need to decided optimum number of primary shards which will be useful while indexing and also beneficial in searching.

system · August 9, 2018, 7:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch how to figure out the shard number with the specified routing? Elasticsearch	5	964	July 5, 2017
What is the `index.number_of_routing_shards` setting? How can I calculate it based on the number of shards? Elasticsearch	8	2828	March 31, 2022
Custom routing of shard number Elasticsearch	4	1576	July 5, 2017
Routing to a specific shard Elasticsearch	13	2241	September 8, 2018
Custom routing, how to prepare new custom routing formula? or get to know the hashing in detail Elasticsearch	9	624	February 20, 2019

Finding optimum number of shards for custom routing

Related topics