Elasticsearch Shard distribution size differs enormously

sauasmast · August 9, 2021, 5:09pm

I need to load 1.2 billion documents in the elasticsearch. As of today we have 6 nodes in the cluster. To equally distribute the shards among the 6 nodes I have mentioned the number of shards to be 42. I use spark and it takes me almost 3 days load the index. The shards distribution looks so off.

The node6 only has two shards in it while node 2 has almost 10 shards. The size distribution is also not even. Some shards are 114.6gb while some are just 870mb within the same node.

I have tried to figure out the solution too. I can include the index.routing.allocation.total_shards_per_node: 7 while creating the index and make it evenly distribute. Will forcing the designated amount of shards in the node, crash the node if there is not enough resource available?

I want to size the shards evenly. My index size is 900 gb apprx. I want each shards to be atleast 20 gb. Could I use the following setting while creating the index? max_primary_shard_size: 25gb Is setting up max shard size only possible through ilm policy and will I require roll over policy for that ? I am not too familiar with the ilm. Sorry if this does not make sense.

The main reason I am trying to optimize the index is because I am getting timeout error on my application when I am querying the elastic search. I know I can increase my timeout time in my application and do some query optimization, but first I want to optimize my index and make my application as fast as possible.

I load the index only one time and do not write any documents to it after onetime load. For additional data, which i load every 15 days, I create a different index and use an alias name on the both the indexes to query. Other than sharding if there is any suggestion to optimize my indexes I will really appreciate it. It takes me 3 days just to load the data so it is quite difficult to experiment.

warkolm · August 10, 2021, 3:40am

Welcome to our community!
(I think you asked this on stackoverflow?)

What does your indexing process look like? Are you indexing with a custom routing value?

sauasmast · August 10, 2021, 3:09pm

So sorry about that. I have followed up in the stack overflow. You can close this topic over here and we can interact in the stackoverflow. Link for the discussion on stackoverflow is as below:

system · September 7, 2021, 3:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Managing shard distribution Elasticsearch	7	397	July 6, 2017
Shards are not equal size in one index Elasticsearch	5	1397	July 5, 2017
Re-balancing shard allocation Elasticsearch	21	913	June 20, 2018
Trying to optimize Elasticsearch cluster Elasticsearch	3	1028	February 20, 2017
Elasticsearch [6.8] Shard distribution based on traffic Elasticsearch	2	366	October 27, 2022

Elasticsearch Shard distribution size differs enormously

Related topics