Balancing disk usage on large clusters?

akrzos · August 26, 2020, 3:01pm

We have a relatively large cluster and one consistent issue we have seen from time to time is inconsistent disk usage balancing because ElasticSearch is balancing by shard count rather than shard resource consumption. Basically we will see all nodes have similar shard counts as expected however a few nodes might have been favored for small shards or 0 doc indexes. While I can address the 0 doc indexes relatively easily, the small indexes/shards are somewhat purposeful in that ILM will age that data out according to our expected retention. (So I do not just want to just try and make all shards equal in size)

Does anyone have some easy to consume resources for more efficiently balancing on disk usage as well?

warkolm · August 26, 2020, 11:40pm

Yep, it balances by shard count. I have seen people change balancing settings, but it's not something that we recommend.

Are you crossing any watermark levels with things as they are? What sort of differences are you seeing between the nodes? (Would a _cat/nodes?v&h=id,v,rp,dt,du,dup be possible to share?)

(PS it's Elasticsearch, no S )

akrzos · August 27, 2020, 2:56pm

id   v      rp     dt    du   dup
Goe5 7.6.2  96  6.9tb 6.7tb 95.96
uclB 7.6.2  87  6.9tb 5.8tb 84.45
KxA4 7.6.2  99  6.9tb 6.5tb 94.26
VAwV 7.6.2  97  6.9tb   6tb 86.19
oZIl 7.6.2  97  6.9tb 6.3tb 90.38
-k0_ 7.6.2  99  6.9tb 6.6tb 95.68
_Asj 7.6.2  98  6.9tb 6.6tb 95.77
_fn_ 7.6.2  98  6.9tb 6.3tb 91.50
EiqT 7.6.2  89 17.4gb 5.8gb 33.73
l9ce 7.6.2  98 17.4gb 4.9gb 28.09
DQp6 7.6.2  98  6.9tb 6.5tb 93.79
s93T 7.6.2  98  7.2tb 6.4tb 88.89
QYoq 7.6.2  75 17.4gb 4.4gb 25.23
3rx_ 7.6.2  98  6.9tb 6.2tb 89.02
7iqI 7.6.2  79 17.4gb 5.1gb 29.53
xOAX 7.6.2  96  6.9tb 6.3tb 90.77
21pb 7.6.2 100  7.2tb 6.7tb 92.27
3xj1 7.6.2  97  6.9tb 6.2tb 89.28
_NE3 7.6.2  94  6.9tb 6.6tb 95.08
55ca 7.6.2  99  6.9tb 6.5tb 94.50
AArZ 7.6.2  95  6.9tb 6.6tb 94.91
dsL3 7.6.2  98  7.2tb 5.3tb 73.87
3Afq 7.6.2  88 17.4gb 4.9gb 28.12

Issues we see are such that in aggregate this cluster has enough space for the daily load of logging and metrics however for whatever reason the actual load of disk usage is not uniformly spread. Shards are balanced at this point in time for the cluster and baring the master/ml nodes (There are 3 masters, and 2 ml nodes) we can see there is are data nodes with as little as around 300GiB of space and data nodes with as much as 1.8TiB.

system · September 24, 2020, 2:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index balance in the cluster Elasticsearch	13	2253	August 17, 2020
How to balance data between nodes by disk disk usage % Elasticsearch	1	1951	January 7, 2017
Disk space per node in for ES cluster is not balanced across the nodes Elasticsearch	4	5223	December 3, 2018
Elasticsearch cluster uneven distribution of data Elasticsearch	6	2325	October 5, 2020
Disk Usage - Something isnt adding up Elasticsearch	2	213	November 14, 2022

Balancing disk usage on large clusters?

Related topics