Is there a way to rebalance data nodes by disk space and not shards?
I have a cluster that has a total of 846 indices but 273 of them have KBs of data while the rest have around 7GBs each since they rollover at 7GB. This is due to the way I index data by k8s namespaces (i.e. <namespace>-000001). Some pods from some namespaces rarely or never log so they barely have any data.
So even though the shards distribute to all data nodes evenly, the disk usage is different. Some data nodes are 75-80% full and some are 50% full.
I need a way to address this by maybe running some script on a k8s cronjob that will rebalance by disk usage and keep an even number of shards per data node (or maybe the number of shards dont matter anymore).
It'll balance by shard count until you start to hit disk watermarks. In your case there's not much you can do other than alter your indexing strategy to compact some of the smaller sources.
The disk watermark has been changed to 10% from 15% and it's been ok so far but still near 85% used at the moment for a select few (2~3 out of 13 data nodes).
Not sure if I can just compact smaller sources since they are still considered hot indices (actively being written to or not) according to my indexing strategy.
All warm indices (not actively being written to) are around 7GB after rolled over and shrunk.
I believe the rebalance based on disk usage can be done manually so I am trying to think of a good way to automate this somehow.
Do you have a hot-warm architecture for your nodes, using shard allocation awereness configurations in the elasticsearch.yml?
If you have this kind of configuration enabled you can force some index to have its shards distributed between hot nodes and others index to be distributed between warm nodes.
You can also use custom attributes if you want.
For example, if you have this in your elasticsearch.yml for two of your nodes:
node.attr.node_type: small_indices
You can use the request below to move some indices between those two nodes:
PUT index-name/_settings
{
"index.routing.allocation.require.node_type": "small_indices",
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.