Is there a way to rebalance data nodes by disk space and not shards?

danksim · June 2, 2021, 9:03pm

I have a cluster that has a total of 846 indices but 273 of them have KBs of data while the rest have around 7GBs each since they rollover at 7GB. This is due to the way I index data by k8s namespaces (i.e. <namespace>-000001). Some pods from some namespaces rarely or never log so they barely have any data.

So even though the shards distribute to all data nodes evenly, the disk usage is different. Some data nodes are 75-80% full and some are 50% full.

I need a way to address this by maybe running some script on a k8s cronjob that will rebalance by disk usage and keep an even number of shards per data node (or maybe the number of shards dont matter anymore).

warkolm · June 3, 2021, 4:11am

It'll balance by shard count until you start to hit disk watermarks. In your case there's not much you can do other than alter your indexing strategy to compact some of the smaller sources.

danksim · June 3, 2021, 3:30pm

Thanks for the reply.

The disk watermark has been changed to 10% from 15% and it's been ok so far but still near 85% used at the moment for a select few (2~3 out of 13 data nodes).

Not sure if I can just compact smaller sources since they are still considered hot indices (actively being written to or not) according to my indexing strategy.

All warm indices (not actively being written to) are around 7GB after rolled over and shrunk.

I believe the rebalance based on disk usage can be done manually so I am trying to think of a good way to automate this somehow.

leandrojmp · June 3, 2021, 4:08pm

Do you have a hot-warm architecture for your nodes, using shard allocation awereness configurations in the elasticsearch.yml?

If you have this kind of configuration enabled you can force some index to have its shards distributed between hot nodes and others index to be distributed between warm nodes.

You can also use custom attributes if you want.

For example, if you have this in your elasticsearch.yml for two of your nodes:

node.attr.node_type: small_indices

You can use the request below to move some indices between those two nodes:

PUT index-name/_settings
{
  "index.routing.allocation.require.node_type": "small_indices",
}

This way you can better organize your shards, but

danksim · June 3, 2021, 9:39pm

I will look into this. Thank you @leandrojmp !

system · July 1, 2021, 9:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help with data node rebalancing please Elasticsearch	4	1508	July 5, 2017
Insure rebalance data between data node Elasticsearch	3	393	February 23, 2022
Disk space per node in for ES cluster is not balanced across the nodes Elasticsearch	4	5224	December 3, 2018
Index balance in the cluster Elasticsearch	13	2253	August 17, 2020
Unbalanced disk usage with ES 6.1.3 Elasticsearch	4	2554	May 1, 2018

Is there a way to rebalance data nodes by disk space and not shards?

Related topics