Shrink api "must have all shards allocated on the same node to shrink" error

i am using the the following doc,

https://www.elastic.co/guide/en/elasticsearch/reference/6.3/indices-shrink-index.html

on elastic search 6.3.0-1

when i run this i get ack message back,

PUT filebeat-6.0.0-2018.12.17/_settings
{
"settings": {
"index.routing.allocation.require._ip": "172.16.99.212",
"index.blocks.write": true
}
}

however when i run this

POST filebeat-6.0.0-2018.12.17/_shrink/filebeatnew-6.0.0-2018.12.17
{
"settings": {
"index.number_of_replicas": 1,
"index.number_of_shards": 1,
"index.codec": "best_compression"
},
"aliases": {
"my_search_indices": {}
}
}

i get the following error,

"type": "illegal_state_exception",
"reason": "index filebeat-6.0.0-2018.12.17 must have all shards allocated on the same node to shrink index"

when i do a cluster health nothing get reallocated so I'm confused as to why the shards are not getting shifted around for the shrink api to work correctly. I have a 3 node cluster were two of them are master and the other one is a data node, 5 shards and 1 replica is what the current shards are set to and I'm trying to take it down to 1 primary and 1 replica.

thanks

The ack you receive from this step indicates that the cluster has accepted the settings update, not that it has finished relocating all the shards onto the one node. You need to wait for that to happen too, perhaps using the wait_for_no_relocating_shards option of the cluster health API.

Hey David,

I'm a bit confused here, after executing the first step i execute cluster health and no shards are getting reallocated at all so i dont think its moving them around for the second command to work properly at all.

in order to verify nothing is getting moved around i also went to the monitoring page in kibana and look at the shard legend page for that particular index and I still see all shards on the same nodes as they were before the execution of the command.

I see. Curious. Can you use the allocation explain API to find out more about why the shards are not moving? It may help to set include_yes_decisions on this API too.

when i run the command and then run allocation api, i get this

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[2sxScBp][172.16.99.212:9300][cluster:monitor/allocation/explain]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=true]"
  },
  "status": 400
}

Sure, the shards are assigned, just to the wrong nodes, so you'll need to ask it about a specific shard:

GET /_cluster/allocation/explain?include_yes_decisions
{
  "index": "filebeat-6.0.0-2018.12.17",
  "shard": 0,
  "primary": true
}

its the 4th shard thats not moving over the rest are on node 1 and here is its allocation explanation

{
  "index": "filebeat-6.0.0-2018.12.17",
  "shard": 4,
  "primary": true,
  "current_state": "started",
  "current_node": {
    "id": "SvZv1kUgQcK-qx5QbvpcyA",
    "name": "SvZv1kU",
    "transport_address": "172.16.99.213:9300",
    "attributes": {
      "ml.machine_memory": "109830402048",
      "ml.max_open_jobs": "20",
      "xpack.installed": "true",
      "ml.enabled": "true"
    }
  },
  "can_remain_on_current_node": "no",
  "can_remain_decisions": [
    {
      "decider": "filter",
      "decision": "NO",
      "explanation": """node does not match index setting [index.routing.allocation.require] filters [_name:"lxc-elastic-01",_ip:"172.16.99.212"]"""
    },
    {
      "decider": "disk_threshold",
      "decision": "YES",
      "explanation": "there is enough disk on this node for the shard to remain, free: [141.4gb]"
    },
    {
      "decider": "shards_limit",
      "decision": "YES",
      "explanation": "total shard limits are disabled: [index: -1, cluster: -1] <= 0"
    },
    {
      "decider": "awareness",
      "decision": "YES",
      "explanation": "allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it"
    }
  ],
  "can_move_to_other_node": "no",
  "move_explanation": "cannot move shard to another node, even though it is not allowed to remain on its current node",
  "node_allocation_decisions": [
    {
      "node_id": "ShECWbXqSLGNo-fM8zfe5w",
      "node_name": "ShECWbX",
      "transport_address": "172.16.99.211:9300",
      "node_attributes": {
        "ml.machine_memory": "109830402048",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "ml.enabled": "true"
      },
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "YES",
          "explanation": "shard has no previous failures"
        },
        {
          "decider": "replica_after_primary_active",
          "decision": "YES",
          "explanation": "shard is primary and can be allocated"
        },
        {
          "decider": "enable",
          "decision": "YES",
          "explanation": "all allocations are allowed"
        },
        {
          "decider": "node_version",
          "decision": "YES",
          "explanation": "can relocate primary shard from a node with version [6.3.0] to a node with equal-or-newer version [6.3.0]"
        },
        {
          "decider": "snapshot_in_progress",
          "decision": "YES",
          "explanation": "no snapshots are currently running"
        },
        {
          "decider": "restore_in_progress",
          "decision": "YES",
          "explanation": "ignored as shard is not being recovered from a snapshot"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": """node does not match index setting [index.routing.allocation.require] filters [_name:"lxc-elastic-01",_ip:"172.16.99.212"]"""
        },
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[filebeat-6.0.0-2018.12.17][4], node[ShECWbXqSLGNo-fM8zfe5w], [R], s[STARTED], a[id=dbr3T0SCT8Co4DLkkB7Hdg]]"
        },
        {
          "decider": "disk_threshold",
          "decision": "YES",
          "explanation": "enough disk for shard on node, free: [141.4gb], shard size: [4.1mb], free after allocating shard: [141.4gb]"
        },
        {
          "decider": "throttling",
          "decision": "YES",
          "explanation": "below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2]"
        },
        {
          "decider": "shards_limit",
          "decision": "YES",
          "explanation": "total shard limits are disabled: [index: -1, cluster: -1] <= 0"
        },
        {
          "decider": "awareness",
          "decision": "YES",
          "explanation": "allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it"
        }
      ]
    },
    {
      "node_id": "2sxScBp3Rr6HBME8nQXQiw",
      "node_name": "2sxScBp",
      "transport_address": "172.16.99.212:9300",
      "node_attributes": {
        "ml.machine_memory": "109830402048",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "ml.enabled": "true"
      },
      "node_decision": "no",
      "weight_ranking": 2,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "YES",
          "explanation": "shard has no previous failures"
        },
        {
          "decider": "replica_after_primary_active",
          "decision": "YES",
          "explanation": "shard is primary and can be allocated"
        },
        {
          "decider": "enable",
          "decision": "YES",
          "explanation": "all allocations are allowed"
        },
        {
          "decider": "node_version",
          "decision": "YES",
          "explanation": "can relocate primary shard from a node with version [6.3.0] to a node with equal-or-newer version [6.3.0]"
        },
        {
          "decider": "snapshot_in_progress",
          "decision": "YES",
          "explanation": "no snapshots are currently running"
        },
        {
          "decider": "restore_in_progress",
          "decision": "YES",
          "explanation": "ignored as shard is not being recovered from a snapshot"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": """node does not match index setting [index.routing.allocation.require] filters [_name:"lxc-elastic-01",_ip:"172.16.99.212"]"""
        },
        {
          "decider": "same_shard",
          "decision": "YES",
          "explanation": "the shard does not exist on the same node"
        },
        {
          "decider": "disk_threshold",
          "decision": "YES",
          "explanation": "enough disk for shard on node, free: [141.4gb], shard size: [4.1mb], free after allocating shard: [141.4gb]"
        },
        {
          "decider": "throttling",
          "decision": "YES",
          "explanation": "below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2]"
        },
        {
          "decider": "shards_limit",
          "decision": "YES",
          "explanation": "total shard limits are disabled: [index: -1, cluster: -1] <= 0"
        },
        {
          "decider": "awareness",
          "decision": "YES",
          "explanation": "allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it"
        }
      ]
    }
  ]
}

Thanks, that's what we're after. This tells us that it is trying to move it but can't find the right place to move it to:

Both nodes report this as the reason they are not a suitable location:

node does not match index setting [index.routing.allocation.require] filters [_name:"lxc-elastic-01",_ip:"172.16.99.212"]

It looks like you are trying to move it to a node with IP address 172.16.99.212 and name lxc-elastic-01, but this does not match any of your nodes. The only node with that IP address has name 2sxScBp and not lxc-elastic-01:

      "node_name": "2sxScBp",
      "transport_address": "172.16.99.212:9300",

Hey David,

this is interesting because I'm using the ip filter,

"index.routing.allocation.require._ip": "172.16.99.212"

and changed it to this now with the matching name and it worked

"index.routing.allocation.require._name": "2sxScBp"

thanks a million for your help much appreciated.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.