We also hit this exact same issue, about 1 week after turning on ILM for the first time.
Captured the allocation explain output and the current shard allocation of the source index from the shrink operation:
/_cluster/allocation/explain output
{
"note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
"index": "shrink-c0tp-v60.agentprocessevent@1m-001820",
"shard": 7,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "INDEX_CREATED",
"at": "2022-12-18T22:53:44.659Z",
"last_allocation_status": "no"
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions": [
{
"node_id": "2BvqNyIjT3W0KfWvC9sOkg",
"node_name": "elasticsearch-0-es-warm-19",
"transport_address": "10.64.130.4:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-22399c39-dqwm",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-d",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 15,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "fpIRkV8qTDO29ys5r_wx_g",
"node_name": "elasticsearch-0-es-warm-17",
"transport_address": "10.64.182.6:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-22399c39-hjaj",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-d",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 16,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "r8JQSVObQ42Bu60BEeGVMg",
"node_name": "elasticsearch-0-es-warm-5",
"transport_address": "10.64.169.6:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-22399c39-vm1a",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-d",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 17,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "bYEGLS8qS1OBJ62l3KuKmw",
"node_name": "elasticsearch-0-es-warm-8",
"transport_address": "10.64.171.5:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-22399c39-6bhd",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-d",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 18,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "iEPAh6X4T5SZD3zKcTE4TQ",
"node_name": "elasticsearch-0-es-warm-3",
"transport_address": "10.64.153.3:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-bdb36591-2rhi",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-b",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 19,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "RuGVVrmBQomXmzJrfb-Hvw",
"node_name": "elasticsearch-0-es-warm-2",
"transport_address": "10.64.174.5:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-22399c39-sybb",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-d",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 20,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "gbQY19IgTo28ABTg8QvyFw",
"node_name": "elasticsearch-0-es-warm-9",
"transport_address": "10.64.150.7:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-bdb36591-vgxz",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-b",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 21,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "J1hIgk9zSpCydd2BMcoFTA",
"node_name": "elasticsearch-0-es-warm-1",
"transport_address": "10.64.152.5:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-2898672c-13n8",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-c",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 22,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "rqS7V2UyTuWbsXOe9piKZw",
"node_name": "elasticsearch-0-es-warm-13",
"transport_address": "10.64.164.3:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-2898672c-uzkm",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-c",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 23,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "5FMircURTdiSWKuLC2a4zw",
"node_name": "elasticsearch-0-es-warm-15",
"transport_address": "10.64.143.4:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-bdb36591-uzmj",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-b",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 24,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "UJwokT4-Rc-g5Jy87BYTcg",
"node_name": "elasticsearch-0-es-warm-7",
"transport_address": "10.64.159.5:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-2898672c-iiky",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-c",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 29,
"deciders": [
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "9OGUbZEIRzqOPsRgEWnJQw",
"node_name": "elasticsearch-0-es-warm-4",
"transport_address": "10.64.179.4:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-2898672c-3jre",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-c",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 30,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "Kqu5vRSJQL2_886fTYxOJw",
"node_name": "elasticsearch-0-es-warm-18",
"transport_address": "10.64.158.4:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-2898672c-qsr0",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-c",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 31,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "rcb_e-0BQ_yuMaszbRHeuA",
"node_name": "elasticsearch-0-es-warm-16",
"transport_address": "10.64.154.5:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-2898672c-1j4s",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-c",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 32,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "45jcOptdTjqnA_fG1punmg",
"node_name": "elasticsearch-0-es-warm-0",
"transport_address": "10.64.131.7:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-bdb36591-blqz",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-b",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 33,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
},
{
"node_id": "KpGc5O6eTpKT5Tr4tTW0uQ",
"node_name": "elasticsearch-0-es-warm-6",
"transport_address": "10.64.142.4:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-bdb36591-94ar",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-b",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 34,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
}
]
},
{
"node_id": "enaoon0EQZ2UYUoeSZmaDA",
"node_name": "elasticsearch-0-es-warm-14",
"transport_address": "10.64.175.6:9300",
"node_attributes": {
"k8s_node_name": "gke-eu-cluster-0-e2-gen4-22399c39-ky6l",
"warm": "true",
"xpack.installed": "true",
"zone": "europe-west1-d",
"transform.node": "false"
},
"node_decision": "no",
"weight_ranking": 35,
"deciders": [
{
"decider": "resize",
"decision": "NO",
"explanation": "source primary is allocated on another node"
},
{
"decider": "filter",
"decision": "NO",
"explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"KpGc5O6eTpKT5Tr4tTW0uQ\"] that hold a copy of every shard in the index"
}
]
}
]
}
I tried manually rerouting primary shards to es-warm-6, however cluster reroute failed because there was already an active replica and if I tried to reroute that shard it wouldn't allow it due to the require._id
allocation setting set by shrink.
I then tried restarting the shrink by deleting the (currently broken) target index, removing the require._id
allocation setting and manually moving ILM to the set-single-node-allocation
step. This picked a new node for shrink and shuffled all the shards onto it as normal, however when it tried to shrink it hit exactly the same error. In the end I've just removed the ILM policy from this index and deleted the broken target in order to restore the cluster to green health.