Hash function for routing in ES 5.5

jmartinter · October 5, 2017, 3:34pm

Hi, all,

I'd like to order a long list of doc ids according to the shard each document is stored in. I paginate this list and run a query by id per page so, if most of the ids in page correspond to a same shard, system performance should improve.

To implement this I'd need to know which hash algorithm is ES 5.5 using for routing. I found some references to DJB in forums but I don't know if it is still valid.

Mark_Harwood · October 5, 2017, 3:47pm

Shards are designed to provide parallelism. Herding all requests to one shard and waiting for it to respond while all other shards stand idle is going to slow things down.

jmartinter · October 6, 2017, 6:53am

It seems I badly explained my use case.

My app gets a list of 5000 ids, paginates it and runs a query per ids page.
At present, ids are in random order and every query hits most of the shards.
If I could know the shard each id belongs to in advance, I'd group ids by shard so only a small part of the shards will be involved in each request.
My understanding is the fewer shards you hit per request, the fewer resources (memory, threads) you require, so system performance improves.

Christian_Dahlqvist · October 6, 2017, 7:05am

If you index your documents with the id as a routing key, you can fetch using the same routing key, which will cause only a single shard to be searched for each id.

Mark_Harwood · October 6, 2017, 8:14am

Further, if your id is the elasticsearch document id then use GET not a search and for efficiciency's sake use MGET

jmartinter · October 6, 2017, 10:37am

Thanks for your replies.

Actually, I'm using MGET (a call to MGET for each ids page)

What I'm trying to do is preprocessing my 5000 ids list in order to sort it by shard (first all ids in shard1, later all ids in shard2, ...)
Once done, I could paginate the list (100 ids/page) and and make a MGET request for each page, knowing each request will access just one or two of my shards (instead of most of them as happens now with my current unsorted list)
All I need is to know the hash function currently used for routing in ES 5.5

Christian_Dahlqvist · October 6, 2017, 10:47am

If you are using mget, it already goes to only one shard for each document, so you may not gain much by doing what you are suggesting.

jmartinter · October 6, 2017, 12:45pm

So, does it mean MGET just perform a bunch of GET operations?
Would my approach make sense if I replace MGET by ids query?

Christian_Dahlqvist · October 6, 2017, 12:56pm

MGET is a way to perform multiple GET requests in a single request, and is the most efficient way to return documents when you know the ids. I therefore do not see why you would need your approach.

system · November 3, 2017, 1:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Documents not getting sharded evenly Elasticsearch	14	1617	July 5, 2017
Hashing algo. for Routing Elasticsearch	4	4136	July 6, 2017
Routing Logic in ElasticSearch Elasticsearch	4	572	July 6, 2017
Routing performance tuning Elasticsearch	5	1462	July 5, 2017
How does Elasticsearch map Integer doc IDs to shards Elasticsearch	8	1194	February 14, 2021

Hash function for routing in ES 5.5

Related topics