Elasticsearch Shard distribution size differs enormously

I need to load 1.2 billion documents in the elasticsearch. As of today we have 6 nodes in the cluster. To equally distribute the shards among the 6 nodes I have mentioned the number of shards to be 42. I use spark and it takes me almost 3 days load the index. The shards distribution looks so off.

The node6 only has two shards in it while node 2 has almost 10 shards. The size distribution is also not even. Some shards are 114.6gb while some are just 870mb within the same node.

I have tried to figure out the solution too. I can include the index.routing.allocation.total_shards_per_node: 7 while creating the index and make it evenly distribute. Will forcing the designated amount of shards in the node, crash the node if there is not enough resource available?

I want to size the shards evenly. My index size is 900 gb apprx. I want each shards to be atleast 20 gb. Could I use the following setting while creating the index? max_primary_shard_size: 25gb Is setting up max shard size only possible through ilm policy and will I require roll over policy for that ? I am not too familiar with the ilm. Sorry if this does not make sense.

The main reason I am trying to optimize the index is because I am getting timeout error on my application when I am querying the elastic search. I know I can increase my timeout time in my application and do some query optimization, but first I want to optimize my index and make my application as fast as possible.

I load the index only one time and do not write any documents to it after onetime load. For additional data, which i load every 15 days, I create a different index and use an alias name on the both the indexes to query. Other than sharding if there is any suggestion to optimize my indexes I will really appreciate it. It takes me 3 days just to load the data so it is quite difficult to experiment.

Welcome to our community! :smiley:
(I think you asked this on stackoverflow?)

What does your indexing process look like? Are you indexing with a custom routing value?

So sorry about that. I have followed up in the stack overflow. You can close this topic over here and we can interact in the stackoverflow. Link for the discussion on stackoverflow is as below:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.