Hello,
I'm currently performing a rolling upgrade from ES 5.6.8 to 6.8.4 following the steps told here
Everything is fine but I'm having troubles with certain nodes and disk space. I'll describe the scenario I'm having troubles with:
Before upgrade:
Node total disk space: 200 GB.
Total shards in this node: 229
Disk used: 71%.
cluster.routing.allocation.disk.watermark.low: 85% (default)
cluster.routing.allocation.disk.watermark.high: 90% (default)
cluster.routing.allocation.disk.watermark.flood_stage: 95% (default)
Upgrade steps:
- "cluster.routing.allocation.enable": "primaries"
- stop elasticsearch service.
- perform upgrade (at this point all 229 shards are unassigned, as expected)
- start elasticsearch service
- node joins the cluster
- "cluster.routing.allocation.enable": null (back to default so shards assign back to the node)
The problem:
227 shards were assigned to the node, as expected, but these 2 remaining shards are still as unassigned. I then proceed to check _cluster/allocation/explain to troubleshoot this issue and I find out the following error:
"deciders": [
{
"decider": "disk_threshold",
"decision": "NO",
"explanation": "allocating the shard to this node will bring the node above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%] and cause it to have less than the minimum required [0b] of free space (free: [49.8gb], estimated shard size: [50gb])"
}
]
Which makes perfect sense because ES is trying to allocate this shard in a node with no space left.. But hey, here is my question. How does it happen? As I stated above, this node already had all these shards with no problem and only had 70% used disk space. Which is ES blocking it now if there were no problem before the upgrade? Am I missing something?
Thank you in advanced.