Shards rebalance when adding nodes and using one primary and one replica

Travis · January 13, 2021, 8:31am

Hello !

I plan to use one primary and one replica by day.

If I add more nodes, does the shards will be automatically rebalanced equally on all the nodes even if I have only one primary and one replica ?

I'm asking this because in this article, they say that we must "overshard" to have room to add nodes and grow in the future :

Thanks for your feedback !

Christian_Dahlqvist · January 13, 2021, 8:57am

Each shard will only reside on a single node, so with one primary and one replica two nodes will be used for each index. Different indices will however be spread out across the available data nodes so all nodes will have data.

Whether to deviate from this sharding practice will depend on how much you index per day, now many indices you are indexing into and your query rate/patterns. Increasing the number of primary shards in order to spread out load may seem appealing, but can result in large number of very small shards which will be inefficient and cause a different set of problems.

If you tell us a bit more about your use case and expected data volumes and retention requirements we may be able to provide better giuidance.

Travis · January 13, 2021, 9:24am

OK thanks. So what is explained in linked article is wrong right ? Oversharding is not really necessary as balance/rebalance happen despite the number of primary shard per index and the number of nodes (correct me if I'm wrong)

Use case is log collection for 600GB/day. We would like a retention of 1 month in HOT and 6 in COLD nodes

Daily volume by data type (I think to create one daily index by data type and rotate firewall logs every 50G using ILM - you think it's a good idea ?) :

Firewall : 300G
Web : 60G
Unix : 30G
Windows : 50G
Switch : 10G
Wireless : 5G
Database : 4G

Christian_Dahlqvist · January 13, 2021, 10:02am

When using time-based indices there is rarely any need to overshard as you can change the index settings in your index template and have that apply to new indices if you need to scale out. Recent versions of Elasticsearch also support the split index API which can be used to split indices if shards end up too large.

I would recommend setting the number of primary shards per index based on the expected data volume and then use ILM to roll over based on actual size. There is often no need to have indices cover an exact time period as long as the time period each index covers aligns with the retention period, e.g. do not generate monthly indices if your retention period is relatively short.

Travis · January 13, 2021, 4:13pm

Thanks for the recommendation Christian

you speak about the load right ?

Christian_Dahlqvist · January 13, 2021, 4:16pm

It can depending on use case be load and/or capacity.

Travis · January 14, 2021, 3:35pm

Good to know. Thank you Christian !

system · February 11, 2021, 3:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard rebalancing after adding nodes Elasticsearch	3	634	March 10, 2019
Elastic Search 6.4.2 Primary shards on same node Elasticsearch	3	906	November 12, 2018
One node one primary shard Elasticsearch	1	246	September 17, 2021
Distributing primary shards? Elasticsearch	8	9027	December 30, 2016
Performance implications of multiple primary shards for one index on the same data node Elasticsearch	4	1466	June 24, 2021

Shards rebalance when adding nodes and using one primary and one replica

Related topics