I'd like to order a long list of doc ids according to the shard each document is stored in. I paginate this list and run a query by id per page so, if most of the ids in page correspond to a same shard, system performance should improve.
To implement this I'd need to know which hash algorithm is ES 5.5 using for routing. I found some references to DJB in forums but I don't know if it is still valid.
Shards are designed to provide parallelism. Herding all requests to one shard and waiting for it to respond while all other shards stand idle is going to slow things down.
My app gets a list of 5000 ids, paginates it and runs a query per ids page.
At present, ids are in random order and every query hits most of the shards.
If I could know the shard each id belongs to in advance, I'd group ids by shard so only a small part of the shards will be involved in each request.
My understanding is the fewer shards you hit per request, the fewer resources (memory, threads) you require, so system performance improves.
If you index your documents with the id as a routing key, you can fetch using the same routing key, which will cause only a single shard to be searched for each id.
Actually, I'm using MGET (a call to MGET for each ids page)
What I'm trying to do is preprocessing my 5000 ids list in order to sort it by shard (first all ids in shard1, later all ids in shard2, ...)
Once done, I could paginate the list (100 ids/page) and and make a MGET request for each page, knowing each request will access just one or two of my shards (instead of most of them as happens now with my current unsorted list)
All I need is to know the hash function currently used for routing in ES 5.5
MGET is a way to perform multiple GET requests in a single request, and is the most efficient way to return documents when you know the ids. I therefore do not see why you would need your approach.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.