I just noticed that we have a massive shard imbalance in our 7.1.0 cluster. I'm assuming it must've happened due to many partial/failed recoveries or similar, and it seems that a lot of data must've just been left hanging in path.data.
We have been flirting with the low and sometimes high watermarks for a while, sometimes letting ES resolve it by rebalancing, but sometimes needing to manually intervene by growing one or more of the disks or adding a node.
shards disk.indices disk.used disk.avail disk.total disk.percent node 54 1.7tb 1.8tb 105.6gb 1.9tb 94 es-data4 47 1.4tb 1.8tb 99.5gb 1.9tb 94 es-data1 60 1.6tb 1.8tb 113.3gb 1.9tb 94 es-data7 15 409gb 1.7tb 131.6gb 1.9tb 93 es-data6 49 1.6tb 1.8tb 116.7gb 1.9tb 94 es-data5 1 51mb 1.7tb 137.7gb 1.9tb 93 es-data3 60 1.6tb 1.8tb 111.8gb 1.9tb 94 es-data8 52 1.6tb 1.8tb 109.4gb 1.9tb 94 es-data2 113 UNASSIGNED
The 113 shards (all replicas) are unassigned because all the nodes are above the low watermark, despite two nodes having a huge difference between disk.indices and disk.used.
Our path.data lives on its own volume with nothing else stored there. I have confirmed that the space is being taken by what appear to be shard directories.
My questions are:
- What is the safest way to remedy the current situation with es-data3 and es-data6 (and therefore our yellow, dangerously full cluster state)? Should I manually reroute these 1+15 active shards, stop the nodes, clear the data dirs, then start the nodes back up with empty disks? Could I possibly save a lot of data transfer by NOT clearing out the disks, since they likely have a lot of up-to-date segments within the shards?
- Am I correct in saying that all of the nodes in our cluster are experiencing this problem of orphaned shards in path.data (to a much lesser degree), given that disk.indices and disk.used are not the same for any node? Is the cluster completely unaware of these shards? If so, is there a relatively easy way to locate which shard directories are orphaned on even these nodes and clear them out?
- Is it a deadlock situation in which the replicas are out of date, and not sync'ing because the low watermark is exceeded?
- If this is in fact due to failed shard replication attempts, should it not be cleaning these up?