[Help] Understanding routing to an index partition

Hey all
I have some questions regarding routing_partition_size field.
Not sure but upon reading the documentation , this is what i understood:

By default, using a routing key will access only 1 primary shard, for indexing document and fetching.
When specifying the routing_partition_size, the routing key will now access the number of shards specified in this field.

I tried to test this. I created an index with 6 primary shards only and specified routing_partition_size with a value of 4.

curl -X PUT localhost:9200/my-index-000000 -d '{ "settings": { "number_of_shards": 6 , "number_of_replicas": 0, "index.routing_partition_size": 4}, "mappings": {"_routing": {"required": true}}}' -H "Content-Ty
pe: application/json"

So in theory when indexing documents with a given key it should be distributed among 4 priamry shards, or i belived so. But i tried to index 100k documents and they all went to the same shard

for i in {0..100000}; do curl -X POST localhost:9200/my-index-000000/_doc?routing=key1 -d '{ "data": "something" }' -H "Content-Type: application/json"; done
curl -X GET localhost:9200/_cat/shards?v

my-index-000000                                               2     p      STARTED      0   226b es02
my-index-000000                                               5     p      STARTED      0   226b es02
my-index-000000                                               3     p      STARTED      0   226b es03
my-index-000000                                               1     p      STARTED      0   226b es01
my-index-000000                                               4     p      STARTED      0   226b es01
my-index-000000                                               0     p      STARTED 100001  2.7mb es03

Am i missing something, or did i understood this wrong?

Thanks in advance for the help
And happy new year :slight_smile:

Welcome to our community! :smiley:

I don't know this in detail, but my reading of the docs suggest that it'd work like this when indexing;

  1. From your 6 shards, only use 4
  2. Then index into one of those 4
  3. From your 6 shards, only use 4, but as you're looking at the entire 6 shards again, then it's possible that it'll pick ones that weren't previously used
  4. Then index into one of those 4
  5. and so on

Basically, just because you set the partition size to less than the total shard count, doesn't mean the shards that will be picked for indexing will always be the same.

But that's an educated guess based on my use of the product for a while, so I don't know the feature or the code down to a level to be able to state that as actual fact. Hopefully someone that does know can stop in and comment either way so we can both learn :slight_smile:

Hey @warkolm
Yup it seems to be as i was thinking.

It seems there is a bug issue going on github. This explains why the 100k documents went all to the same shard :sweat_smile:

Thanks for the answer and help :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.