We have a relatively large cluster and one consistent issue we have seen from time to time is inconsistent disk usage balancing because ElasticSearch is balancing by shard count rather than shard resource consumption. Basically we will see all nodes have similar shard counts as expected however a few nodes might have been favored for small shards or 0 doc indexes. While I can address the 0 doc indexes relatively easily, the small indexes/shards are somewhat purposeful in that ILM will age that data out according to our expected retention. (So I do not just want to just try and make all shards equal in size)
Does anyone have some easy to consume resources for more efficiently balancing on disk usage as well?
Yep, it balances by shard count. I have seen people change balancing settings, but it's not something that we recommend.
Are you crossing any watermark levels with things as they are? What sort of differences are you seeing between the nodes? (Would a _cat/nodes?v&h=id,v,rp,dt,du,dup be possible to share?)
Issues we see are such that in aggregate this cluster has enough space for the daily load of logging and metrics however for whatever reason the actual load of disk usage is not uniformly spread. Shards are balanced at this point in time for the cluster and baring the master/ml nodes (There are 3 masters, and 2 ml nodes) we can see there is are data nodes with as little as around 300GiB of space and data nodes with as much as 1.8TiB.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.