Allocating data to specific shard and node

Hi,

I can see that it is possible to allocate shards to particular nodes, and (some) routing of data is also possible, but... how can I ensure that data will always end up in their own shard?

Scenario:

I have 3 nodes with 2 shards in each node.
[P0 R2] [P1 R0] [P2 R1]

From documentation I know that:
shard_num = hash(_routing) % num_primary_shards

I want to route bucket1's data... The hash of (bucket1) = 9; 9 % 3 = 0, so this data goes to P0.
Then, I want to route bucket2's data... The hash of (bucket2) = 33; 33 % 3 = 0, so this data goes to P0 too.

Any ideas if there is a way around it?

There's not really a way to do that in Elasticsearch. Routing only guarantees that one routing value always maps to a certain subset of shards... but it doesn't prevent other routing values from also mapping to the same set or subset of shards.

If you absolutely must have this level of separation, I think an index is the smallest unit of division that you can use. E.g. each customer gets their own index, which you can control how and where they are allocated (and prevent other customer data from being indexed to the same index). I would just try to make sure the indices are as small as possible to prevent performance problems with too many shards. Ideally most of the indices will be a single shard.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.