Cannot recover index - store.found: false

After a full cluster restart, some of my indices remained red. Among them a few had replicas...

Cluster allocation explain API informs me that a pri 3 rep 1 index is missing its primary shard 0, on every node I get:

      "node_decision": "no",
      "store": {
        "found": false
      }

Well OK, I have a replica! But alas, it cannot be allocated:

          "decision": "NO",
          "explanation": "primary shard for this replica is not yet active"
        }

So... what is the proper way to handle this? I guess it has something to do with the cluster/reroute API calls?
I just want to promote the replica 0 shard and forgot about the loss of primary 0.

Ok, it seems like those indices with replicas recover after time, despite this initial error message.

However I still have indices with 0 replicas where one of the primary shards in the cluster allocation API is reported as:

      "node_decision": "no",
      "store": {
        "found": false
      }

It looks like it's missing shard data, but it was nothing more than a simple node restart... How can I restore this shard?

Also, is there a recommended way to restart nodes other than via systemctl restart?

Maybe stop, wait for java to disappear then start? Is systemctl restart ill-advised?

systemctl restart is reasonable, and shouldn't cause this situation. I think something else is wrong with your setup. Can you share the full output of this command?

GET _cluster/allocation/explain
{"index":"FILL_IN_INDEX_NAME_HERE","shard":0,"primary":true}

Here's the missing shard of 'myindex-2019-05-16'. (Tried to anonymize the data, but keep every relevant information there...)

  "index": "myindex-2019-05-16",
  "shard": 1,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NODE_LEFT",
    "at": "2020-03-16T20:20:07.621Z",
    "details": "node_left [wvzK.....nodeid]",
    "last_allocation_status": "no_valid_shard_copy"
  },
  "can_allocate": "no_valid_shard_copy",
  "allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
  "node_allocation_decisions": [
    {
      "node_id": "1JRBD......",
      "node_name": "elasticsearch-siteone-07.mydomain.example.com",
      "transport_address": "172.16.141.56:9300",
      "node_attributes": {
        "xpack.installed": "true",
        "box_type": "hot"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "4v4Z...",
      "node_name": "elasticsearch-sitetwo-08.mydomain.example.com",
      "transport_address": "172.16.161.57:9300",
      "node_attributes": {
        "xpack.installed": "true",
        "box_type": "hot"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
#THE ABOVE REPEATS FOR ALL DATA NODES

Ok, if every data node reports store.found: false then this shard is gone. Are you sure you're using storage that persists across restarts?

Yes data is stored on raid 10/60 depending on how often it is accessed. I'm not aware of any disk failures. (And this is the second time a cluster restart resulted in a couple of red indices... :frowning: )

Is it possible that the data files are on the node, but are not being read/found by Elasticsearch?

Also, this is version 6.8. And the cluster is overloaded, we are in the process of drastically reducing shard count and data stored in the cluster.

Hang on for a sec, I have an idea...

It seems that only those indices are really affected that were 0 or 1 days old. (The one in the example may be the result of an aborted shrink job(?)).

Also we have ZFS under the cluster, may be that has to do something with it... (Yea I was wrong about RAID10/60, that was the earlier cluster, this one has mirrors.)

So... if I have an index with a shard missing (store.found:false), but I do have the replica of the missing shard... How do I promote that replica to primary?

Cluster allocation API explains that:

          "decision": "NO",
          "explanation": "primary shard for this replica is not yet active"

store.found: false means there is no copy of the shard at all.

Ok, but what about indices with replica shards? It seems that the replica is available but won't be promoted to primary or assigned at all.

{
  "index": "xx-2020-03-15",
  "shard": 1,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "REPLICA_ADDED",
    "at": "2020-03-17T00:02:54.598Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "1JR...",
      "node_name": "xxx",
      "transport_address": "ip:9300",
      "node_attributes": {
        "xpack.installed": "true",
        "box_type": "hot"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "replica_after_primary_active",
          "decision": "NO",
          "explanation": "primary shard for this replica is not yet active"
        },
        {
          "decider": "throttling",
          "decision": "NO",
          "explanation": "primary shard for this replica is not yet active"
        }
      ]
    },

No, that's not what this means. Until the shards are allocated there's not really any difference between primaries and replicas, and we don't even bother looking for possible replicas until we've assigned a primary. So store.found: false means that there's no copy of this shard, neither primary nor replica.

Ok, thank you for your help, I guess we need further investigation how this could happen.

Okay, this was very stupid on my part: one of the 20+ nodes was actually down. I trusted our automation system too much and didn't look close enough.

1 Like