Hash function for routing in ES 5.5

Hi, all,

I'd like to order a long list of doc ids according to the shard each document is stored in. I paginate this list and run a query by id per page so, if most of the ids in page correspond to a same shard, system performance should improve.

To implement this I'd need to know which hash algorithm is ES 5.5 using for routing. I found some references to DJB in forums but I don't know if it is still valid.

Shards are designed to provide parallelism. Herding all requests to one shard and waiting for it to respond while all other shards stand idle is going to slow things down.

It seems I badly explained my use case.

My app gets a list of 5000 ids, paginates it and runs a query per ids page.
At present, ids are in random order and every query hits most of the shards.
If I could know the shard each id belongs to in advance, I'd group ids by shard so only a small part of the shards will be involved in each request.
My understanding is the fewer shards you hit per request, the fewer resources (memory, threads) you require, so system performance improves.

If you index your documents with the id as a routing key, you can fetch using the same routing key, which will cause only a single shard to be searched for each id.

Further, if your id is the elasticsearch document id then use GET not a search and for efficiciency's sake use MGET

Thanks for your replies.

Actually, I'm using MGET (a call to MGET for each ids page)

What I'm trying to do is preprocessing my 5000 ids list in order to sort it by shard (first all ids in shard1, later all ids in shard2, ...)
Once done, I could paginate the list (100 ids/page) and and make a MGET request for each page, knowing each request will access just one or two of my shards (instead of most of them as happens now with my current unsorted list)
All I need is to know the hash function currently used for routing in ES 5.5

If you are using mget, it already goes to only one shard for each document, so you may not gain much by doing what you are suggesting.

So, does it mean MGET just perform a bunch of GET operations?
Would my approach make sense if I replace MGET by ids query?

MGET is a way to perform multiple GET requests in a single request, and is the most efficient way to return documents when you know the ids. I therefore do not see why you would need your approach.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.