Hello
I'm using ECK 1.8.0 and ES 7.16.2.
In order to increase the storage available for an ES cluster managed by the ECK operator, I proceeded through two steps, first I added a new node set with the new disk capacity I need, I waited for the shard relocations, then I removed the old node set, then waited for the shard relocation.
I did that for the data node set (composed of 3 nodes across 3 zones) as well as for the master node set (composed of 3 nodes across 3 zones)
I did it at the same time because I wanted to rename the master node set so I took the opportunity.
However, after relocating the shards, one data node is completely unbalanced even though the large shards are well balanced across the nodes. Here is the status
You can see that the node with the less shards is the one having the disk usage anomaly, I precise that there a single large index, and that index has 6 shards that are well balanced across nodes (2 on each node)
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
70 77.3gb 77.5gb 50.2gb 127.8gb 60 10.208.63. 10.208.63. elastic-site-search-es-data140g-zoneb-0
69 77.3gb 77.6gb 50.2gb 127.8gb 60 10.208.36. 10.208.36. elastic-site-search-es-data140g-zonec-0
4 76.2gb 113.5gb 14.3gb 127.8gb 88 10.208.59. 10.208.59. elastic-site-search-es-data140g-zonea-0
Also, I can see this warning in the master nodes logs
{"type": "server", "timestamp": "2022-10-20T11:49:47,792Z", "level": "WARN", "component": "o.e.c.r.a.d.DiskThresholdDecider", "cluster.name": "elastic-site-search", "node.name": "elastic-site-search-es-master-zonea-0", "message": "after allocating [[members][2], node[LjEUE5PgTiqB65tygsCQAQ], [R], s[STARTED], a[id=b6KSlUKPSVaWSclfM5IelQ]] node [bCZMIAM6RkS6pqH_MEukjg] would have more than the allowed 10% free disk threshold (9.7% free), preventing allocation", "cluster.uuid": "ykOg819RTCKt73Ehvjylxg", "node.id": "ydGRqsigTbScRiwWZk9ztw" }
What is taking that space in the over used node? What can I do to unblock the situation?
Thanks!