What is the `index.number_of_routing_shards` setting? How can I calculate it based on the number of shards?

I'm aware of a similar question to this topic. Namely What are "number_of_routing_shards"? - #2 by steffens. But it does not answer my question or help me understand the index.number_of_routing_shards setting.

I'm hoping someone can help me grok the index.number_of_routing_shards setting. I've come across it in relation to _routing where it is used to select a shard. e.g.

routing_factor = num_routing_shards / num_primary_shards
shard_num = (hash(_routing) % num_routing_shards) / routing_factor

I don't understand what this value would be in practice. A multiple of the primary shard number, I get. But I don't know how I'd calculate it. For example, in an index with the num_primary_shards set to 90, how can I deduce the number_of_routing_shards?

To provide context, we're evaluating if we should adopt a custom _routing rule. I'm introducing metrics that'll calculate how balanced / unbalanced a custom routing would make our Elasticsearch indexes. i.e. Would we route more traffic to some shards versus others. To this end I need to be able to determine which shard will be used. Looking at the documentation the formula seems relatively simple, but I'm struggling to understand number_of_routing_shards, which is central to it. Initially I thought it might just be the total number of shards. But that does not seem to be the case.

BTW if you've an alternative approach or know a utility I can call to calculate the selected shard that too would be of interest.

I think a custom routing rule based on this knowledge would end up being dangerously brittle, but the docs you link pretty much say how to calculate its default:

This setting’s default value depends on the number of primary shards in the index. The default is designed to allow you to split by factors of 2 up to a maximum of 1024 shards.

In other words you take num_primary_shards and multiply it by 2 repeatedly until you get a number between 513 and 1024. For 90 it's 90*2*2*2=720.

I'm content to take your word on 720. To be honest I can't claim I would have come to that conclusion by myself. Nor did the ~10 engineers I asked to look at the documentation. Perhaps it's obvious to ES team member but harder for someone outside ES to understand. An enhancement to the documents might be worthwhile.

I think a custom routing rule based on this knowledge would end up being dangerously brittle

I'd be interested to hear more on this. The brittleness I'm considering is if a shard (and it's two replicas) goes offline. Then anything being read or written to it, by the routing rule, is in trouble. Are there any other concerns / dangers?

The temptation is that, for 50% of our traffic, we can drop the search request cost from 90 shards to a single shard. Which would help with our CPU usage.

It's deliberately obscure because this is very much an implementation detail on which you shouldn't rely.

I meant that if you design a routing scheme based on intimate knowledge of how each routing value maps to the individual shards, perhaps carefully choosing routing values so that each shard remains roughly equal in size, then you are relying on these implementation details not changing in future. If they did change (or say you changed the number of shards in your index) then you could end up with something very unbalanced in its place.

A single shard which contains all the matching docs so has to work correspondingly harder tho, so it's not a guaranteed win.

How large are your shards out of interest?

That makes sense. Helps explain why I found it hard to understand.

I see what you mean. In terms of the number_of_routing_shards setting that's not a big worry. The plan is for this calculation to be temporary. We'll use it to generate metrics to tell us how stable the solution would be. With this information we can come to a go / no go decision. After which the code will be removed. As we're targeting a specific version of ES7, and have no plans to upgrade within a near period of time, we should get away with it. I did notice this logic has changed in ES8.

Our data model is made up of documents, groups, and organisations. Currently _routing is based on the document's id. The metrics I'm putting in place will simulation the effect of switching _routing over to the group and organisation ids. After which we'll weigh the pros and cons.

This is true but we think, with the group id, the balance should be roughly consistent. With the organisation id we're more at risk from large orgs. Which is why the group id is the favourite.

The current index's shards are ~30 GB. The shards in the previous indexes reached ~50-57 GB before a new index was cut. A new index is cut once every ~6 weeks.

The code I used to simulate _routing is

int partitionOffset = 0;
int routingFactor = properties.getEsRoutingSimulationNumberOfRoutingShards() / properties.getEsRoutingSimulationNumberOfShards();
int hash = Murmur3HashFunction.hash(effectiveRouting) + partitionOffset;
return Math.floorMod(hash, properties.getEsRoutingSimulationNumberOfRoutingShards()) / routingFactor;

where getEsRoutingSimulationNumberOfShards() is 30 and getEsRoutingSimulationNumberOfRoutingShards() is set to 960.

Looking at the simulation results for the doc _id, rather than the group or org id, the results look mostly ok. But one shard seems to be getting more documents routed towards it. i.e. It gets 4% of the overall traffic versus the other shards receiving 3%. Checking Elasticsearch directly via it's _cat/shards API its more balanced. i.e. All the shards seem to be on 3%.

That makes me think that either

  • I've a mistake, or missed something, in the formula above or property values
  • Or Elasticsearch does a little more load balancing under the covers

Any chance the second point is possible? Especially when there is no custom _routing rule defined.

I can't reproduce this. I tried this test:

    public void testDistribution() {
        var numRoutingShards = 960;
        var numShards = 30;
        var routingFactor = numRoutingShards / numShards;
        var frequencies = new int[numShards];
        var iterations = 100_000_000;
        for (int i = 0; i < iterations; i++) {
            frequencies[Math.floorMod(Murmur3HashFunction.hash(UUIDs.base64UUID()), numRoutingShards) / routingFactor] += 1;
        }
        var expected = iterations / numShards;
        for (int i = 0; i < frequencies.length; i++) {
            logger.info("frequencies[{}] = {} (deviation from mean {})", i, frequencies[i], frequencies[i] - expected);
        }
        var minFreq = Arrays.stream(frequencies).min().orElseThrow();
        var maxFreq = Arrays.stream(frequencies).max().orElseThrow();
        logger.info("min {} max {} range {}", minFreq, maxFreq, maxFreq-minFreq);
    }

Here's a typical output:

[2022-03-03T18:22:30,551][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] before test
[2022-03-03T18:22:42,529][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[0] = 3333122 (deviation from mean -211)
[2022-03-03T18:22:42,529][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[1] = 3333152 (deviation from mean -181)
[2022-03-03T18:22:42,529][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[2] = 3333617 (deviation from mean 284)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[3] = 3332770 (deviation from mean -563)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[4] = 3332795 (deviation from mean -538)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[5] = 3334148 (deviation from mean 815)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[6] = 3333634 (deviation from mean 301)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[7] = 3330884 (deviation from mean -2449)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[8] = 3335396 (deviation from mean 2063)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[9] = 3330848 (deviation from mean -2485)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[10] = 3330694 (deviation from mean -2639)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[11] = 3335203 (deviation from mean 1870)
[2022-03-03T18:22:42,530][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[12] = 3336015 (deviation from mean 2682)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[13] = 3332726 (deviation from mean -607)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[14] = 3329691 (deviation from mean -3642)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[15] = 3333681 (deviation from mean 348)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[16] = 3332674 (deviation from mean -659)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[17] = 3337441 (deviation from mean 4108)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[18] = 3335784 (deviation from mean 2451)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[19] = 3331326 (deviation from mean -2007)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[20] = 3332416 (deviation from mean -917)
[2022-03-03T18:22:42,531][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[21] = 3331359 (deviation from mean -1974)
[2022-03-03T18:22:42,532][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[22] = 3334882 (deviation from mean 1549)
[2022-03-03T18:22:42,532][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[23] = 3333240 (deviation from mean -93)
[2022-03-03T18:22:42,532][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[24] = 3333283 (deviation from mean -50)
[2022-03-03T18:22:42,532][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[25] = 3332731 (deviation from mean -602)
[2022-03-03T18:22:42,532][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[26] = 3333295 (deviation from mean -38)
[2022-03-03T18:22:42,532][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[27] = 3334274 (deviation from mean 941)
[2022-03-03T18:22:42,532][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[28] = 3336348 (deviation from mean 3015)
[2022-03-03T18:22:42,532][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] frequencies[29] = 3332571 (deviation from mean -762)
[2022-03-03T18:22:42,533][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] min 3329691 max 3337441 range 7750
[2022-03-03T18:22:42,551][INFO ][o.e.c.r.o.h.m.Murmur3HashFunctionTests] [testDistribution] after test

I'm going to guess that something has introduced some skew into your doc IDs.

That was my first suspicion. It makes sense, though the UUIDs (aka effectiveRouting) are fairly straight forward.

What makes me concerned is that this is not replicated in the shard's document count returned by the /_cat/shards API. That means, unless Elasticsearch 7 has extra routing logic to help balance the shards, then something is wrong in the formula / my simulation code.

I'll try focus on that area.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.