How to get shard id from document Id?

I am trying to do a large number of lookups and searches (in order a few million) based on document ids. Instead of sending requests to all shards I would like to partition document id's by specific shards they belong to and then call lookup/search with preference:_shard:x,y.

I am having trouble mapping document id to shard id, I have looked at https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/cluster/routing/OperationRouting.java and https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/cluster/routing/Murmur3HashFunction.java and
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetaData.java

private static int calculateScaledShardId(IndexMetaData indexMetaData, String effectiveRouting, int partitionOffset) {
        final int hash = Murmur3HashFunction.hash(effectiveRouting) + partitionOffset;

        // we don't use IMD#getNumberOfShards since the index might have been shrunk such that we need to use the size
        // of original index to hash documents
        return Math.floorMod(hash, indexMetaData.getRoutingNumShards()) / indexMetaData.getRoutingFactor();
    }

I tried to reconstruct the shard id locally based on this logic, but doesn't seem to get the correct shard id, is there any easy way or am I missing something?

If you are retrieving documents by ID, Elasticsearch will rout the query just to the shard that hold the data. If you are looking to instead minimize the number of shards accessed when querying and make sure related documents are colocated, you could use a routing parameter when indexing and searching. A common use case for this is when you have data from multiple users in a single index and you know that most queries filter on a single users data. This allows you to make sure all data related to any user resides in just one shard and allows Elasticsearch to query just one shard instead of all the shards of the index.

Trying to reverse engineer internal functionality and base logic on this is a bit risky as it may break at any point without warning.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.