Corrupt index due to missing file

Greetings, people who know way more about ElasticSearch than I do! I'm very new to Elastic administration, as our organization has only recently started using it, so please pardon my lack of knowledge.

We have a 4-node cluster running ElasticSearch 6.8.13, and we have a corrupted index.

Curling "/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED" returns the following output:

portal_CorruptedIndexName_1625203723           3 p UNASSIGNED MANUAL_ALLOCATION
portal_CorruptedIndexName_1625203723           3 r UNASSIGNED MANUAL_ALLOCATION
portal_CorruptedIndexName_1625203723           0 p UNASSIGNED MANUAL_ALLOCATION
portal_CorruptedIndexName_1625203723           0 r UNASSIGNED MANUAL_ALLOCATION

Following is the output from "/_cluster/allocation/explain":

{
  "index" : "portal_CorruptedIndexName_1625203723",
  "shard" : 3,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "MANUAL_ALLOCATION",
    "at" : "2021-07-16T06:34:18.221Z",
    "details" : "failed shard on node [iyfZi8qOSUuejn7IHNZaBw]: shard failure, reason [corrupt file (source: [start])], failure CorruptIndexException[Problem reading index. (resource=/elasticsearch/data/nodes/0/indices/2t3xC5NqS468Ft5hFP61Qg/3/index/_fy_Lucene50_0.tim)]; nested: NoSuchFileException[/elasticsearch/data/nodes/0/indices/2t3xC5NqS468Ft5hFP61Qg/3/index/_fy_Lucene50_0.tim]; ",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "W5YlxEiMQv2iely9YJn9sw",
      "node_name" : "XXXXXXELSC01N02",
      "node_attributes" : {
        "ml.machine_memory" : "8041381888",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "iyfZi8qOSUuejn7IHNZaBw",
      "node_name" : "XXXXXXELSC01N04",
       "node_attributes" : {
        "ml.machine_memory" : "8041381888",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "vzIbJUz1TgSOmRZsITp0SQ",
        "store_exception" : {
          "type" : "corrupt_index_exception",
          "reason" : "failed engine (reason: [corrupt file (source: [start])]) (resource=preexisting_corruption)",
          "caused_by" : {
            "type" : "i_o_exception",
            "reason" : "failed engine (reason: [corrupt file (source: [start])])",
            "caused_by" : {
              "type" : "corrupt_index_exception",
              "reason" : "Problem reading index. (resource=/elasticsearch/data/nodes/0/indices/2t3xC5NqS468Ft5hFP61Qg/3/index/_fy_Lucene50_0.tim)",
              "caused_by" : {
                "type" : "no_such_file_exception",
                "reason" : "/elasticsearch/data/nodes/0/indices/2t3xC5NqS468Ft5hFP61Qg/3/index/_fy_Lucene50_0.tim"
              }
            }
          }
        }
      }
    },
    {
      "node_id" : "mmjyzudUR5Sapj23Y0AdRw",
      "node_name" : "XXXXXXELSC01N01",
      "node_attributes" : {
        "ml.machine_memory" : "8041381888",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "wBDiKORZRt-SP-F3ZcvlsA",
      "node_name" : "XXXXXXELSC01N03",
      "node_attributes" : {
        "ml.machine_memory" : "8041381888",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "7VC_Ru5VRbiZxZQy8_8N_w"
      }
    }
  ]
}

Unfortunately, we didn't discover that our snapshots had stopped running until after the index became corrupted, so restoring from a snapshot is not an option. Since the cause of the corruption appears to be a missing file, I tried shutting down the node where the file was missing, and allowing the cluster to fully recover, hoping that would reallocate the index, but that didn't resolve the issue.

I'm wondering if it's possible to do something to recreate that file and recover the index, ideally without losing any data.

Many thanks in advance for any assistance!

Welcome to our community! :smiley:

You've probably lost data, as that particular file no longer exists.
I am not sure how to recover from here though sorry, hopefully someone else can jump in.

Welcome! This is also my first time seeing this error, but looks like it's been previously raised/solved. Does this also work in your case?

1 Like

No, that previous solution says to restore the index from a snapshot, which isn't an option in this case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.