Hello, I am encountering a situation now:
Version: 6.8.1
Number of nodes: 20, of which 15 nodes are 500G and 5 nodes are 2000G
Now the 500G disk must be filled up to trigger the water level configuration of the disk. The 2000G node can only be used in the same amount as the 500G node. How can we make the cluster preferentially use the 2000G node?
I need to balance the disks according to the percentage of the node's corresponding capacity. Is there any way to achieve this?
Elasticsearch will balance the data based on shards, not disk capacity, it basically assumes that all disks will have the same size.
When you have different hardware profiles, even if the difference is just the disk size, you should work with data tiering or use custom attributes to do shard allocation filtering.
Version 6.8.X does not have any native way to do data tiering (hot, warm, cold), so your only option would be to use custom attributes to do shard allocation filtering.
Basically you would need to create a custom attribute in your elasticsearch.yaml to group your 2000GB nodes and 500GB nodes, something like node.attr.disk_size: big and node.attr.disk_size: small.
Then you would need to edit your templates to use this attribute as described in the example in the documentation linked.
If you want to move the data from the 2000 GB nodes to the 500 GB you would need to use an ILM policy to move the data after some time.
But your main problem here is that you are using an ancient version, 6.8.1 was released 7 years ago and is not supported anymore, it may be even complicated to get help on any issue because people may not remember how it work since a lot has changed.
You should plan an upgrade as soon as possible, but given how old it is, it may be easier to spin up a new cluster on version 9.X.
This is true, 6.8 is irresponsibly old these days, but upgrading won't fix this AFAIK. See e.g. these docs:
IMPORTANT: Elasticsearch assumes nodes within a data tier share the same hardware profile (such as CPU, RAM, disk capacity).
You could split those 2000G nodes up into 4x500G nodes, assuming they also have 4x the RAM and CPU of the smaller nodes. Or else make them a different data tier as Leandro suggests. But there's nothing automatic to help you here, this simply isn't something Elasticsearch is designed to handle.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.