Hi,
I've always had the the problem that the ECK operator does not reset the shard_allocation
cluster setting after restarting a node. The result is that after a rolling restart over night, I end up with hundreds of unallocated replica shards and no redundancy, because transient.cluster.routing.allocation.enable
is set to primaries
. This will also halt the rolling reboot at some point when no more nodes can be restarted without making the cluster red.
This basically means that for each cluster reboot, I have to stand by all the time and issue a
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": null
}
}
after each node, only so the operator can reset it to primaries
as soon as the next node is restarted.
Is this a known issue? For me this issue has been around since the beginning and no update has solved it so far.