How to distribute documents across shards equally, using _routing in Elasticsearch

likhith1995 · March 28, 2022, 6:03am

Hi Team,

We are ingesting the data into Elasticsearch, using 3 routing values (namely 3g, 4g, 5g). We assigned 3 primary shards and 1 replica. Index Rollover is configured to 1gb (max per primary shard) for testing.

When we send only 3g data, only 1 shard is getting filled completely and rest of the 2 are empty. The rollover is completed after 1gb is reached and the remaining 2 shards are empty.

We also tried configuring index.partition_size, but the data isnt getting distributed as expected.

Kindly let us know if there is any approach.

Thanks and regards,
Likhith P

Christian_Dahlqvist · March 28, 2022, 6:15am

Why are you using routing in the first place? If you do not use it data will be evenly distributed. If you use routing all data related to a specific routing value will go to a single shard. Several different routing valiues can also end up going to the same shard. If you have few routing values you are therefore likely to cause imbalances.

likhith1995 · March 28, 2022, 6:24am

Hi Christian,

Thanks for your reply.

Currently we are receiving large data. Any query executed is searching all the shards available in our data stream. For that reason, we are using routing approach to decrease the query response time (by decreasing the number of shards queried).

We might have 3-4 routing values in our current implementation. Any suggestion on, how to configure routing such that the shards are utilized to the fullest for each routing value.

Thanks and regards,
Likhith P

warkolm · March 28, 2022, 6:27am

Why not just have a few different indices?

likhith1995 · March 28, 2022, 7:01am

Yes, we can.

But we have dependencies on configuring the input to Elastic if we are separating indices based on user1/user2/user3.... And also many data streams will be created and index templates must also be configured for them separately.

For that reason we want to use routing and limit the number of shards that will be queried.

Thanks and regards,
Likhith P

Christian_Dahlqvist · March 28, 2022, 7:37am

Routing is primarily useful when you have a large number of routing values. I would not recommend it for your use case. Separate data streams would be better as each stream could roll over separately based on data volume.

likhith1995 · March 28, 2022, 7:54am

Thank you.

If we calculate and split data using user1,user2 and user3, the number of data streams reach 1482 for 2 years. Is this approach healthy for Elastic cluster?

Regards,
Likhith P

Christian_Dahlqvist · March 28, 2022, 7:56am

Why would you have 1482 data streams for 3 users?? Would you not have one data stream per user with a single primary shard and set the rollover size to somewhere between 25GB and 50GB?

likhith1995 · March 28, 2022, 10:36am

We have close to 20 zoom levels for each month. For current setup we are calculating things for 2 years. so 20*24 = 480.

For 3 users we need 480*3 data streams.

Christian_Dahlqvist · March 28, 2022, 11:41am

I do not understand. Data streams are generally used for time-series data, so I do not understand why you would create one per month. If you have 3 users and 20 different types of data that need separate streams (is that really the case?) you would end up with 60 streams. Each stream would in turn be backed by a number of indices covering a specific time period. You could e.g. set each stream to generate a new index once it covers a full month or is larger than e.g. 50GB in size.

It would probably help if you explained your use case in more detail.

system · April 25, 2022, 11:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards/routing documents imbalance problem Elasticsearch	9	745	July 6, 2017
Routing: Equal distribution on all shards Elasticsearch	5	624	June 28, 2019
Elasticsearch is sharing some shards even if the routing is specified Elasticsearch	4	1162	July 5, 2017
[SOLVED] Customing document routing Elasticsearch	7	793	July 5, 2017
Question about shard routing Elasticsearch	2	291	July 6, 2017

How to distribute documents across shards equally, using _routing in Elasticsearch

Related topics