Shard allocation based on shard size

Hi,

I was wondering if there is an option to include shard size in the shard allocation decision process.

The Disk-based shard allocation settings and the number of shards/node works, and most of or shards are 50G in size, but sometimes a few 2-3G shards get in the mix. When this happens, ES only looks at the number of shards/node, not if the total shard size looks equal.

Is this something ES can take into account? Or will I need to write my own rebalancing logic for this kind of behaviour?

Thanks!

It currently works on shard count only, this can sometimes lead to lumpy disk use across nodes.

Is it causing issues?

Well yes, I'm trying to keep as much data in ES as possible, but when nodes of a specific type (hot/warm/cold) are not evenly distributed, I waste a lot of disk space :slight_smile:.

I guess I'll need to write a rebalancer based on shard sizes which runs every once and a while.

How much disk space are you talking here? Are you able to share the output from _cat/allocation?v?

Part of the node list:

shards disk.indices disk.used disk.avail disk.total disk.percent host            ip          node
   106        4.3tb     4.3tb    639.6gb      4.9tb           87 elkdatac001 x.y.4.13  elkdatac001
   107        4.7tb     4.7tb    216.5gb      4.9tb           95 elkdatac002 x.y.4.14  elkdatac002
   107        4.5tb     4.5tb      410gb      4.9tb           91 elkdatac003 x.y.4.15  elkdatac003

Doesn't look like much wasted space to me :slight_smile:

Since you have quite large disks you might like to consider configuring the disk watermarks differently. The high watermark default of 90% means that Elasticsearch tries to keep 500GB free on each 5TB disk. That's not totally silly, there is some belief that filesystem performance drops once disks get too full, but if you would rather run closer to the wire then that's your call.

Elasticsearch aims to keep nodes under the high watermark but will only move shards between nodes when necessary. This means that "evening out" disk usage is deliberately avoided but it doesn't mean that any space is wasted, even if your shards have rather different sizes.

Yes, I know :slight_smile: , I already changed the high watermark, but as you can see in the example, at least 8 shards (of 50G) could be added in elkdatac001 when total shard size would be included in the rebalancing.

Thanks for the answers!

Sure, but does Elasticsearch have a good reason to move any shards onto this node? If so, what is it?

If for instance elkdatac002 were above its high watermark then Elasticsearch would indeed move shards across to balance things out.

Well, the good reason would be "the disks are not filled evenly" :slight_smile:.

The current example I have now, are the disks on my hot nodes:

shards disk.indices disk.used disk.avail disk.total disk.percent host        ip         node
    84      749.4gb   752.1gb     93.3gb    845.4gb           88 elkdatah001 x.34.4.7   elkdatah001
    88      643.6gb   645.9gb    199.5gb    845.4gb           76 elkdatah002 x.34.4.8   elkdatah002
    87      664.3gb   665.2gb    180.2gb    845.4gb           78 elkdatah003 x.34.4.9   elkdatah003
    87      690.9gb   691.9gb    153.5gb    845.4gb           81 elkdatah004 x.34.4.240 elkdatah004
    87      643.7gb   644.9gb    200.5gb    845.4gb           76 elkdatah005 x.34.4.241 elkdatah005
    87      704.6gb   705.9gb    129.7gb    835.6gb           84 elkdatah006 x.34.4.242 elkdatah006

The first one has a lot of large shards, but as you can see, ES does not allocate any new incides to this node, but the disk numbers are still a problem.

That is why I think it is strange the total disk size is not taken into account in the relocating logic, but it is also strange I'm the only person running into this problem :smiley:.

That in itself isn't a good reason to move shards around. Moving a shard is an expensive operation, it's not worth doing simply for the sake of tidiness.

This is what I'm not understanding. I see that the numbers aren't equal, but I don't see why this is a problem. How would your life be better if the numbers were closer together? Is there some operational issue that this unevenness is causing?

You're not the only person to experience this confusion, and we recently expanded the docs on this subject for that reason. Disk space absolutely is taken into account when relocating shards, but that doesn't imply we aim for equal disk usage across nodes. That goal is expensive and unnecessary.

Hi @DavidTurner, thanks for the answers.

After thinking about it for a while, I'll just gonna try to reconfigure our own host monitoring. As Elastic indeed has DiskUsage checks, the hosts will probably eventually become evenly distributed.

The problem we see is that in busy days, the apache filebeat logs grow a lot, so we need to have some buffer, but we want it as small as possible.

But I'll start by removing our own disk checks, thanks again for the feedback!

1 Like

I see - that is a better reason for relocating shards :slight_smile:

The usual solution is to set the gap between the low and high watermarks to be larger than the typical size of the day's indices on each node, and the gap between the high and flood-stage watermarks to be large enough to allow time to mitigate any overage before disks fill up. This largely works in practice, but it's not completely ideal.

Well, I basically have (had) two problems, the monitoring system (which I disabled for the ES data mount, because the watermark system works well for this) and the busy days.

Because I use ILM, the watermark system is not really helping me. When the hot nodes are full, I need to manually change the ILM config to make sure the indices get allocated to warm or cold nodes. But that is something I will look at in the future. For now a bigger buffer wil do.

1 Like

I see, in which case it sounds like you might be looking for #47764. Please feel free to leave a comment (even just a +1) to let us know you'd like us to work on it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.