Custom routing, how to prepare new custom routing formula? or get to know the hashing in detail

In my data model i store data based on language, so i am thinking to distribute documents based on one language per shard. Let say if i have product with id 10, and have fields for product like Label, cost or color
So i store data, language wise like 10_1, 10_2,10_3, .....etc here 1,2,3 are language ids.

10_1=> Label : english_label, color : color of the product in english lang.,
10_2=> Label : japanese_label, color : color of the product in japanese_lang

So my concern here currently elastic use routing formula is
shard no. = hash(routing) % no. of primary shards
then in my case routing = _id =10_1 like that.
So please help me to find out unique formula to send all same language products to single shard. Because no matter what you do, hash() function internally changes the final value.

That's not a great idea, because language use is lumpy. A lot of people speak english (either natively or as an alternate language), so your shard for that will be huge.
Not many people speak australian, so that shard would be small.

And then how do you manage shards when you want to add/remove languages? Do you just start a single index with 500 shards and hope you use them all? (Note, no, you should never do that, it's a huge waste).

So the question is, what value do you see having everything from the same language in the same shard?

Hi Mark,

Yes i can understand your point regarding wastage size or having poor architecture, but here are the answers to points and questions

i ensure that all language will have the data for every product.

on addition of language of i'll add that language's data in to any one existing shard, obviously i'll write logic that if no. of languages are more than no. of shards by particular percent then i would reindex data by increasing shards (again shard per language) . no Elasticsearch' actual reindex but yes similar kind of that.

well here i see all data of that product for that language, i come with approach that initially elastic node broadcast to all shards and then gather all data and then perform other operations like sorting, relevance score and all that, so instead of that i'll tell elastic to go on particular shard so that time and operations at elastic will be saved.

Now, i hope you get the idea.
So do you know how to improve routing mechanism, please let me know

Thanks and regards
Kshitij.

How much time will you save? Have you tested it?

Routing ensures that all data related to a specific routing key ends up in the same shard, but it does not make a specific shard hold data for just a single routing key. If this is what you are looking for you MAY be better off having multiple indices with a single primary shard each.

Hi Mark,

No i have not tested it because currently i am not able to distribute data as per language and one shard consist more than one language data. It will be really great if you tell me to do like this, either some value for "routing" or can we design our own formula.

Bye and thanks

I don't think there is a way to do this and be sure that all shards are used.

Did you consider my earlier comment?

Hi Christian,

Yes I have considered your comment and thinking on it and will propose this to my team members and discuss, but before that thinking to change the formula some way.

Thanks for reply

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.