Unable to cancel allocation of primary shard

We currently have a small issue in our cluster where at some point in time one or more nodes were shutdown uncleanly and this created a situation where replicas and primaries are not in sync any more (sympoms like described in #12661)
I can remedy the situation quite easily for some shards by performing a forced flush and allocating the replicas as described in the ticket. However for some shards the replica has more documents than the primary, this probably happened, when the old replica that was not fully updated got promoted to the new primary, when the host holding the primary with more documents died. For these shards, when I perform a flush and reassign the replica I will loose those extra documents.

When I try to promote the replica as a stale primary the cluster (rightfully so) complains that there already is a primary and it won't promote a stale shard over that.

So next I stopped the node holding the primary and was able to assign the replica as new primary, bring the old primary online again, force a flush and assign. All documents safe, everybody happy.

However since there are additional shards to fix and I'd like to avoid too many restarts I tried cancelling the allocation of the primary, in order to be able to promote the replica.

When I do

POST _cluster/reroute?retry_failed=true  
{
  "commands": [
    {
      "cancel": {
        "index": "metricbeat-2017.10.30",
        "shard": 4,
        "node": "node1",
        "allow_primary": true
      }
    }
  ]
}

The response shows the primary for shard 4 as unassigned:

"4": [
              {
                "state": "UNASSIGNED",
                "primary": true,
                "node": null,
                "relocating_node": null,
                "shard": 4,
                "index": "metricbeat-2017.10.30",
                "recovery_source": {
                  "type": "EXISTING_STORE"
                },
                "unassigned_info": {
                  "reason": "REROUTE_CANCELLED",
                  "at": "2017-11-01T17:25:35.780Z",
                  "delayed": false,
                  "allocation_status": "fetching_shard_data"
                }
              },
              {
                "state": "UNASSIGNED",
                "primary": false,
                "node": null,
                "relocating_node": null,
                "shard": 4,
                "index": "metricbeat-2017.10.30",
                "recovery_source": {
                  "type": "PEER"
                },
                "unassigned_info": {
                  "reason": "REROUTE_CANCELLED",
                  "at": "2017-11-01T17:24:43.555Z",
                  "delayed": false,
                  "allocation_status": "no_attempt"
                }
              }
            ]

However _cat/shards shows the primary as started, and I can query the index and receive no failed shards in the response. I've disabled shard allocation to avoid the shard being cancelled and then immediately reassigned, but that does not change anything.

Probably I am misunderstanding the way that the cancel command is supposed to work, or can't correctly read the response, but according to my understanding so far this should have worked.
Maybe someone can shed some light on this?

Update:
Having looked at the code it seems to me as if the cancel command can only be used to cancel allocation of primaries if they are still in initializing state, not if they are started. Also it looks like "allow_primary" has to be set to false to allow cancelling a primary allocation.
This would explain what I am seeing, however the question remains, how can I unassign a primary shard without taking the whole node offline?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.