Custom routing, how to prepare new custom routing formula? or get to know the hashing in detail

kshi_yelpale000 · January 16, 2019, 9:14pm

In my data model i store data based on language, so i am thinking to distribute documents based on one language per shard. Let say if i have product with id 10, and have fields for product like Label, cost or color
So i store data, language wise like 10_1, 10_2,10_3, .....etc here 1,2,3 are language ids.

10_1=> Label : english_label, color : color of the product in english lang.,
10_2=> Label : japanese_label, color : color of the product in japanese_lang

So my concern here currently elastic use routing formula is
shard no. = hash(routing) % no. of primary shards
then in my case routing = _id =10_1 like that.
So please help me to find out unique formula to send all same language products to single shard. Because no matter what you do, hash() function internally changes the final value.

warkolm · January 17, 2019, 12:19am

That's not a great idea, because language use is lumpy. A lot of people speak english (either natively or as an alternate language), so your shard for that will be huge.
Not many people speak australian, so that shard would be small.

And then how do you manage shards when you want to add/remove languages? Do you just start a single index with 500 shards and hope you use them all? (Note, no, you should never do that, it's a huge waste).

So the question is, what value do you see having everything from the same language in the same shard?

kshi_yelpale000 · January 17, 2019, 8:42am

Hi Mark,

Yes i can understand your point regarding wastage size or having poor architecture, but here are the answers to points and questions

i ensure that all language will have the data for every product.

on addition of language of i'll add that language's data in to any one existing shard, obviously i'll write logic that if no. of languages are more than no. of shards by particular percent then i would reindex data by increasing shards (again shard per language) . no Elasticsearch' actual reindex but yes similar kind of that.

well here i see all data of that product for that language, i come with approach that initially elastic node broadcast to all shards and then gather all data and then perform other operations like sorting, relevance score and all that, so instead of that i'll tell elastic to go on particular shard so that time and operations at elastic will be saved.

Now, i hope you get the idea.
So do you know how to improve routing mechanism, please let me know

Thanks and regards
Kshitij.

warkolm · January 17, 2019, 9:25am

How much time will you save? Have you tested it?

Christian_Dahlqvist · January 17, 2019, 9:44am

Routing ensures that all data related to a specific routing key ends up in the same shard, but it does not make a specific shard hold data for just a single routing key. If this is what you are looking for you MAY be better off having multiple indices with a single primary shard each.

kshi_yelpale000 · January 22, 2019, 7:25pm

Hi Mark,

No i have not tested it because currently i am not able to distribute data as per language and one shard consist more than one language data. It will be really great if you tell me to do like this, either some value for "routing" or can we design our own formula.

Bye and thanks

warkolm · January 22, 2019, 10:25pm

I don't think there is a way to do this and be sure that all shards are used.

Christian_Dahlqvist · January 23, 2019, 3:22am

Did you consider my earlier comment?

kshi_yelpale000 · January 23, 2019, 11:57am

Hi Christian,

Yes I have considered your comment and thinking on it and will propose this to my team members and discuss, but before that thinking to change the formula some way.

Thanks for reply

system · February 20, 2019, 11:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Finding optimum number of shards for custom routing Elasticsearch	6	408	August 9, 2018
Routing to a specific shard Elasticsearch	13	2241	September 8, 2018
Custom routing of shard number Elasticsearch	4	1576	July 5, 2017
Custom routing and multiple indexes where shard distribution is uniform Elasticsearch ccs-cross-cluster-search	3	513	August 28, 2019
Elasticsearch how to figure out the shard number with the specified routing? Elasticsearch	5	964	July 5, 2017

Custom routing, how to prepare new custom routing formula? or get to know the hashing in detail

Related topics