Disk usage difference between data nodes

Lior_Yakobov · June 14, 2021, 7:09am

Hello,
we have a pretty big cluster (26 data nodes, almost 100TB), and I have a question about the disk usage distribution across data nodes.
I know that Elasticsearch takes to consideration the equality between number of shards rather than disk usage, and in case that shards sizes are not so averaged, it causes very big differences in nodes disk usage:

Is there any setting that can be changed in order to align disk usage between nodes, based on usage rather than shards count?

Thanks,
Lior

porscheme · June 22, 2021, 8:50pm

I would love to know your deployment configuration; we have similar requirement to build 50TB ES cluster.

Lior_Yakobov · June 27, 2021, 2:40pm

Hey @porscheme ,
can you be more specific?
I'll try to expand as much as possible, hopefully it will answer your question.

Our cluster runs on EC2 nodes of type i3en.2xlarge, installed with RPM, configured max heap (31GB) per data nodes, and each data nodes has 5TB disk space.

Lior

porscheme · June 28, 2021, 5:27pm

Our data size at source is 50 TB, besides accounting for organic growth of data how much storage should we allocate for ES overhead?
We wanted use 10 VMs Azure Ls32 SKU (256 GB RAM, 32 CPUs, 4 X 2 TB NVMe premium SSD disks), 700 shards each shard 75 GB. Is this good?

Lior_Yakobov · July 6, 2021, 2:15pm

Hey @porscheme,

First of all, I refrain from saying about myself that I am an expert,
but from my experience (maintaining this cluster over 3.5 years) I can tell that the maximum cluster sizing is 31GB for heap size, therefore you can have 64GB RAM on each node as maximum.
Plus, the shard size should also be close to the heap size, so I believe it will be better to stick around 30-40GB rather than 75GB.

If someone else has other insights/suggestions, I would like to hear them, but this is my opinion.

Lior

d.silwon · July 6, 2021, 2:40pm

Hello,

Please review my similar topic: Elasticsearch cluster uneven distribution of data

Regards,
Dan

system · August 3, 2021, 2:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch cluster uneven distribution of data Elasticsearch	6	2481	October 5, 2020
Uneven Shard Distribution Elasticsearch	2	2103	January 18, 2018
Disk space per node in for ES cluster is not balanced across the nodes Elasticsearch	4	5422	December 3, 2018
Elasticsearch data nodes - disk usage optimisation Elasticsearch	6	659	March 20, 2023
How to balance data between nodes by disk disk usage % Elasticsearch	1	1984	January 7, 2017

Disk usage difference between data nodes

Related topics