Elastic shard balancing / allocation

Petr.Simik · May 18, 2023, 4:54am

Hi
We are on Elastic 8.6 with 38 hot data nodes and ingesting about 140 different indices
Top 10 indices have indexing rate about 15-50K events/sec. 20 indices has 1-20 K events/sec. And remaining 100 indices indexing in low rate 0-1000 of events/sec.
All nodes have 8vcpu, 32GB RAM, 2TB SSD.
As you can see in table below the cluster overflows some nodes and some are almost empty despite the fact that they all have the same capacity and performance

could you help me with some ideas what I should do to fix this issue?
This impacts index latency and performance of the cluster.

cluster balancing paramteres are in default, I was thinking to adjust cluster.routing.allocation.balance.disk_usage from its default 2e-11f to some other value "2e-9f" but I do not have no experience how the value affects rebalancing behavior.

node	shards	disk.indices	disk.total	disk.used	disk.percent
tela12_node	31	420.8gb	1.9tb	427.3gb	21
tela36_node	108	726.6gb	1.9tb	734.7gb	36
tela21_node	75	728.6gb	1.9tb	734.6gb	37
tela34_node	155	929.2gb	1.9tb	934.9gb	46
tela13_node	209	908.2gb	1.9tb	914.8gb	46
tela37_node	142	951.5gb	1.9tb	957.7gb	47
tela33_node	114	1000gb	1.9tb	1008.1gb	49
tela10_node	224	1009.8gb	1.9tb	1016.3gb	51
tela28_node	217	1tb	1.9tb	1tb	52
tela22_node	221	1tb	1.9tb	1tb	52
tela18_node	199	1tb	1.9tb	1tb	54
tela35_node	142	1.1tb	1.9tb	1.1tb	55
tela04_node	195	1tb	1.9tb	1.1tb	57
tela17_node	216	1.1tb	1.9tb	1.1tb	58
tela14_node	214	1.1tb	1.9tb	1.1tb	58
tela07_node	215	1tb	1.9tb	1.1tb	60
tela19_node	236	1.2tb	1.9tb	1.2tb	66
tela24_node	219	1.3tb	1.9tb	1.3tb	67
tela26_node	219	1.3tb	1.9tb	1.3tb	69
tela02_node	222	1.2tb	1.9tb	1.3tb	71
tela01_node	232	1.3tb	1.9tb	1.4tb	73
tela16_node	187	1.4tb	1.9tb	1.4tb	74
tela06_node	203	1.3tb	1.9tb	1.4tb	74
tela05_node	179	1.3tb	1.9tb	1.4tb	76
tela23_node	206	1.5tb	1.9tb	1.5tb	81
tela32_node	153	1.6tb	1.9tb	1.6tb	83
tela03_node	213	1.5tb	1.9tb	1.6tb	85
tela30_node	123	1.7tb	1.9tb	1.7tb	86
tela11_node	176	1.6tb	1.9tb	1.7tb	88
tela27_node	177	1.7tb	1.9tb	1.7tb	89
tela15_node	149	1.7tb	1.9tb	1.7tb	90
tela29_node	130	1.7tb	1.9tb	1.7tb	90
tela08_node	120	1.6tb	1.9tb	1.7tb	90
tela38_node	127	1.8tb	1.9tb	1.8tb	91
tela20_node	94	1.7tb	1.9tb	1.7tb	91
tela09_node	126	1.6tb	1.9tb	1.7tb	91
tela31_node	159	1.7tb	1.9tb	1.8tb	91
tela25_node	172	1.7tb	1.9tb	1.7tb	92

I setup index templates in appx this manner: hi-perf indices have more primary shards with sharads per node option so I utilise performance of more cpu.

Indexing rate kEPS	Number of primary shards	shards per node
30+	15	1
20-30	15	1
15-20	10	1
10-15	8	1
5-10	6	any
1-5	3	any
0-1	1	any

this is my fourth post regarding this topic , the new version has brought some improvements, however this issue is still ongoing.

system · June 15, 2023, 4:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Share Rebalancing on large clusters (2.4) Elasticsearch	5	900	January 19, 2017
Index balance in the cluster Elasticsearch	13	2310	August 17, 2020
Re-balancing shard allocation Elasticsearch	21	811	June 20, 2018
Cluster shards unbalanced and keep moving shards around after upgrade to 8.8.1 Elasticsearch	19	1523	July 26, 2023
All the shards are being assigned to a single node Elasticsearch	5	282	April 3, 2023

Elastic shard balancing / allocation

Related topics