How to resolve the below unassigned shards?

How do I resolve the below unassigned shards ?

What exactly happened - one of the node in the cluster was down for 2days and I bought the node back again.

After above issue - I could see shards got unassigned.

Again I have bought down another node because of disk space issue, allocation is not happening for few shards. I dont want this node back again, as I have added another alternative node.

_cluster/allocation/explain?pretty

output:

{
    "index": "prodtemp",
    "shard": 0,
    "primary": false,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "NODE_LEFT",
        "at": "2021-03-20T19:27:19.950Z",
        "details": "node_left [4cZwIGYbTYe5YTMQpg1fBg]",
        "last_allocation_status": "no_attempt"
    },
    "can_allocate": "no",
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "node_allocation_decisions": [
        {
            "node_id": "xxxxxxxxxxxxxxxxxxxx",
            "node_name": "xxxxxxxxxxx",
            "transport_address": "xx.xx.1.xx:9300",
            "node_attributes": {
                "ml.machine_memory": "33731194880",
                "xpack.installed": "true",
                "transform.node": "true",
                "ml.max_open_jobs": "20"
            },
            "node_decision": "no",
            "deciders": [
                {
                    "decider": "node_version",
                    "decision": "NO",
                    "explanation": "cannot allocate replica shard to a node with version [7.9.2] since this is older than the primary version [7.11.2]"
                }
            ]
        },
        {
            "node_id": "xxxxxxxxxxxxxx",
            "node_name": "yyyyyyyyyyyyy",
            "transport_address": "xx.xx.1.xxx:9300",
            "node_attributes": {
                "ml.machine_memory": "33731170304",
                "ml.max_open_jobs": "20",
                "xpack.installed": "true",
                "transform.node": "true"
            },
            "node_decision": "no",
            "deciders": [
                {
                    "decider": "node_version",
                    "decision": "NO",
                    "explanation": "cannot allocate replica shard to a node with version [7.10.1] since this is older than the primary version [7.11.2]"
                }
            ]
        },
        {
            "node_id": "xxxxxxx",
            "node_name": "zzzzzzzzzzzzz",
            "transport_address": "xx.xxx.1.xx:9300",
            "node_attributes": {
                "ml.machine_memory": "33731194880",
                "ml.max_open_jobs": "20",
                "xpack.installed": "true",
                "transform.node": "true"
            },
            "node_decision": "no",
            "deciders": [
                {
                    "decider": "node_version",
                    "decision": "NO",
                    "explanation": "cannot allocate replica shard to a node with version [7.9.2] since this is older than the primary version [7.11.2]"
                }
            ]
        },
        {
            "node_id": "xx55555",
            "node_name": "aaaaaaaaaaaaaa",
            "transport_address": "xx.xxx.1.xx:9300",
            "node_attributes": {
                "ml.machine_memory": "33731174400",
                "ml.max_open_jobs": "20",
                "xpack.installed": "true",
                "ml.max_jvm_size": "21474836480",
                "transform.node": "true"
            },
            "node_decision": "no",
            "deciders": [
                {
                    "decider": "same_shard",
                    "decision": "NO",
                    "explanation": "a copy of this shard is already allocated to this node [[prodtempcstempdata][0], node[lwFqKkuPQPm3_UYN9DU9VQ], [P], s[STARTED], a[id=w-uD-aw_RvWCYG47uvR88w]]"
                }
            ]
        }
    ]
}

It looks like all nodes in your cluster are not running exactly the same version of Elasticsearch and that a shard on a newer node can not be replicated to an older one. Upgrade any older nodes to the same version as the newest one in the cluster.

1 Like

I could see the following version:
node 1:
"version" : {
"number" : "7.10.1",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "xxxxxxxxxxxxxxxxxxx",
"build_date" : "2020-12-05T01:00:33.671820Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"

node 2:
"version" : {
"number" : "7.9.2",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "xxxxxxxxxxxxxxfffffff",
"build_date" : "2020-09-23T00:45:33.626720Z",
"build_snapshot" : false,
"lucene_version" : "8.6.2",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},

node 3:

  "version" : {
    "number" : "7.9.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "fffffffffffffffff",
    "build_date" : "2020-09-23T00:45:33.626720Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },

node 4:

 "version" : {
    "number" : "7.11.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "4fffffffffffdddddd",
    "build_date" : "2021-03-06T05:54:38.141101Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },

Primary is in node 4 - Should I upgrade other nodes to the same version ? Is there anyway that I could manually allocate shards without upgrading the version.

Because we don't want the down time as this is production system with heavy traffic.

Also please share the recommended approach to upgrade the elastic search version without loosing the data, if this is mandatory step to follow.

All nodes need to be upgraded to exactly the same version. Primary shards on a newer node can not be replicated to an older node in any way as Lucene can not read shards created on newer versions.

1 Like

Ok understood.

Is there any recommended approach to upgrade the version in other 3 nodes in the cluster without loosing data ?

Any API available to upgrade version ?

Please share documentation if any to upgrade version safely.

I would recommend you perform a rolling upgrade one node at a time.

1 Like