Shards unassigned after datanode addition

Hi,
I have added a data node to my cluster. Shards are unassigned after the that. 2 primary and 2 replica shards are shown to be UNASSIGNED.

Output of GET

{
  "index": "other-administrator-2017.03.09",
  "shard": 3,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NODE_LEFT",
    "at": "2019-07-16T05:01:51.383Z",
    "details": "node_left[B2GcZ5KFRG23IMePeJWA4g]",
    "last_allocation_status": "no_valid_shard_copy"
  },
  "can_allocate": "no_valid_shard_copy",
  "allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
  "node_allocation_decisions": [
    {
      "node_id": "0MdxLk76SNCwFezbM8Uybw",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd02.nfvi.localdomain",
      "transport_address": "172.17.41.146:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "7Vhv_D4oR_2KtDQpWuxtPw",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd05.nfvi.localdomain",
      "transport_address": "172.17.41.149:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "B2GcZ5KFRG23IMePeJWA4g",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd01.nfvi.localdomain",
      "transport_address": "172.17.41.145:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "cC9S519DT0WvEg-dDrivzA",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd04.nfvi.localdomain",
      "transport_address": "172.17.41.148:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "eaIb-E2ET5-hqTUs9VQRqg",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd03.nfvi.localdomain",
      "transport_address": "172.17.41.147:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    }
  ]
}

This output shows error to be NODE_LEFT. What should be the reason?

The primary was on a node with ID B2GcZ5KFRG23IMePeJWA4g a couple of days ago:

This node no longer has any trace of this shard:

I think you have done something more destructive than simply adding a node to this cluster, because that would not cause what we're seeing here.

1 Like

What should be the most probable cause, can you tell me what to debug?

Pretty hard to say, sorry. Elasticsearch wouldn't have deleted the missing shard data, but we can't really comment on what else might have done so.

If you have all the logs from all the machines for July 16th, that might help to find what happened, may be.

I have found following error in masternode logs.

2019-07-16T09:04:20,030][WARN ][o.e.g.GatewayAllocator$InternalReplicaShardAllocator] [dxb-dso01-nec-nfvi2-cmn-nec-celshm01.nfvi.localdomain] [other-administrator-2019.08.13][2]: failed to list shard for shard_store on node [7Vhv_D4oR_2KtDQpWuxtPw]

Caused by: org.elasticsearch.transport.RemoteTransportException: [datanode05][172.17.41.149:9300][internal:cluster/nodes/indices/shard/store[n]] Caused by: org.elasticsearch.ElasticsearchException: Failed to list store metadata for shard [[other-administrator-2019.08.13][2]] at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:113) ~[elasticsearch-6.4.1.jar:6.4.1]

May be something wrong happened on your hard drive? Is it the only thing that you can see in that log? Could you share it entirely? Upload it to gist.github.com if too big.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.