I wonder why this seems problematic to you. This is perfectly normal Elasticsearch behavior. It does not balance by disk space, rather it uses the low and high watermarks to ensure that no node gets to a full disk.
As long as your workload is served well, this is no problem. Perhaps there is an underlying symptom you are chasing?
@Arethusa , does that mean that the percentages you showed in your original posting was cpu-usage? If so, is that persistent over the course of a day? Does it slow down some of your searches or indexing?
Elasticsearch does not balance by cpu-usage, but it does have configurable rules that can help the balancer. However, this typically constraints the cluster from balancing and moving shards freely, which can lead to resilience issues if not used carefully. So before embarking on such tuning, it is worth ensuring that there is a client-visible issues to resolve?
I'm afraid to see one ELS near 90% data usage, and another near 50% data usage.
This is perfectly normal behavior as per my prior comment. If a node grows to 90%+ disk usage, Elasticsearch will move shards off the node to nodes below 85% disk usage. Those limits can be controlled via the watermark settings, but are typically left to their defaults.
As long as shards can be moved elsewhere you should thus never hit 100% disk usage on a single node (but could temporarily see nodes go slightly over 90% disk usage).
Your indices shard setting may be the source of this. If you have only one shard per index the data of this index won't be balanced between hosts that's why you may have hosts using more spaces than others
Data are balanced by shards, and one shard only sits on one host (you may have a replica of it on another host though). Let say you have a big index whith only one shard ,no replica and 4 smaller indices. Each host will store each index but the one which hosts the big index will have a much higher disk usage than the others.
Take what i said with a grain of salt as it's based on my understanding of how es works.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.