I have an elasticsearch deployed on kubenetes/aws platform. I'm observing that Disk Free Space is not equal in the data nodes.
I have 4 data nodes out of which,
Two data nodes have around 1 TB free disk space
One data node has 800 gb free
And last one has 500 GB free
Can you please guide me about 'how to analyse the cause of this issue' and 'how do I optimise the disk usage' ?
The cause of this issue is how Elasticsearch works, it will balance the number of the shards according to the number of nodes, trying to keep the same amount of shards in every node.
The issue is that you may have different kinds of data that will create different sizes of shards and Elasticsearch does not take the size of the shards in consideration, so you may end-up with a couple of big shards on some nodes and small shards on others.
For many years Elastic recommended that you avoid having small indices on the cluster, but in the last couple of years Elastic itself does not follow this guideline as the integrations and systems indices creates hundreds of small indices, while this may have changed a little, having small indices results in small shards that can lead to some nodes having more disk usage than others.
On which version you are? On version 8.6 Elastic introduced a new heuristic setting to also consider the shard size while rebalancing, this would help to have a more evenly balanced cluster by not having a node with only big shards and other nodes with only small shards.
For example, consider that you have a 3 nodes cluster and 6 indices with 1 shard each.
index1 and index2 have 20 GB
index3 and index4 have 10 GB
index5 and index6 have 2 GB
On versions before 8.6 Elastic may put index1 and index2 on one node, and index5 and index6 on another node, so you would have a node with a disk usage of 40 GB an another one with a disk usage of 4 GB, from version 8.6 is expected that this would not happen and the shards would be more balanced.
I'm not on version 8.6 yet, so I can not confirm if this work well or not.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.