Corrupt index due to missing file

RonJenningsFTI · August 18, 2021, 6:33pm

Greetings, people who know way more about ElasticSearch than I do! I'm very new to Elastic administration, as our organization has only recently started using it, so please pardon my lack of knowledge.

We have a 4-node cluster running ElasticSearch 6.8.13, and we have a corrupted index.

Curling "/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED" returns the following output:

portal_CorruptedIndexName_1625203723           3 p UNASSIGNED MANUAL_ALLOCATION
portal_CorruptedIndexName_1625203723           3 r UNASSIGNED MANUAL_ALLOCATION
portal_CorruptedIndexName_1625203723           0 p UNASSIGNED MANUAL_ALLOCATION
portal_CorruptedIndexName_1625203723           0 r UNASSIGNED MANUAL_ALLOCATION

Following is the output from "/_cluster/allocation/explain":

{
  "index" : "portal_CorruptedIndexName_1625203723",
  "shard" : 3,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "MANUAL_ALLOCATION",
    "at" : "2021-07-16T06:34:18.221Z",
    "details" : "failed shard on node [iyfZi8qOSUuejn7IHNZaBw]: shard failure, reason [corrupt file (source: [start])], failure CorruptIndexException[Problem reading index. (resource=/elasticsearch/data/nodes/0/indices/2t3xC5NqS468Ft5hFP61Qg/3/index/_fy_Lucene50_0.tim)]; nested: NoSuchFileException[/elasticsearch/data/nodes/0/indices/2t3xC5NqS468Ft5hFP61Qg/3/index/_fy_Lucene50_0.tim]; ",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "W5YlxEiMQv2iely9YJn9sw",
      "node_name" : "XXXXXXELSC01N02",
      "node_attributes" : {
        "ml.machine_memory" : "8041381888",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "iyfZi8qOSUuejn7IHNZaBw",
      "node_name" : "XXXXXXELSC01N04",
       "node_attributes" : {
        "ml.machine_memory" : "8041381888",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "vzIbJUz1TgSOmRZsITp0SQ",
        "store_exception" : {
          "type" : "corrupt_index_exception",
          "reason" : "failed engine (reason: [corrupt file (source: [start])]) (resource=preexisting_corruption)",
          "caused_by" : {
            "type" : "i_o_exception",
            "reason" : "failed engine (reason: [corrupt file (source: [start])])",
            "caused_by" : {
              "type" : "corrupt_index_exception",
              "reason" : "Problem reading index. (resource=/elasticsearch/data/nodes/0/indices/2t3xC5NqS468Ft5hFP61Qg/3/index/_fy_Lucene50_0.tim)",
              "caused_by" : {
                "type" : "no_such_file_exception",
                "reason" : "/elasticsearch/data/nodes/0/indices/2t3xC5NqS468Ft5hFP61Qg/3/index/_fy_Lucene50_0.tim"
              }
            }
          }
        }
      }
    },
    {
      "node_id" : "mmjyzudUR5Sapj23Y0AdRw",
      "node_name" : "XXXXXXELSC01N01",
      "node_attributes" : {
        "ml.machine_memory" : "8041381888",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "wBDiKORZRt-SP-F3ZcvlsA",
      "node_name" : "XXXXXXELSC01N03",
      "node_attributes" : {
        "ml.machine_memory" : "8041381888",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "7VC_Ru5VRbiZxZQy8_8N_w"
      }
    }
  ]
}

Unfortunately, we didn't discover that our snapshots had stopped running until after the index became corrupted, so restoring from a snapshot is not an option. Since the cause of the corruption appears to be a missing file, I tried shutting down the node where the file was missing, and allowing the cluster to fully recover, hoping that would reallocate the index, but that didn't resolve the issue.

I'm wondering if it's possible to do something to recreate that file and recover the index, ideally without losing any data.

Many thanks in advance for any assistance!

warkolm · August 18, 2021, 9:13pm

Welcome to our community!

You've probably lost data, as that particular file no longer exists.
I am not sure how to recover from here though sorry, hopefully someone else can jump in.

Stef_Nestor · August 18, 2021, 9:28pm

Welcome! This is also my first time seeing this error, but looks like it's been previously raised/solved. Does this also work in your case?

RonJenningsFTI · August 18, 2021, 9:46pm

No, that previous solution says to restore the index from a snapshot, which isn't an option in this case.

system · September 15, 2021, 9:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
1 index having two shards went unassigned, when we do the cluster explain we received below response. Request you to assist on this issue Elasticsearch	1	151	November 7, 2023
CorruptIndexException: docs out of order Elasticsearch	0	87	July 4, 2024
Nested: CorruptIndexException[failed engine (reason: [corrupt file (source: [index]) Elasticsearch	2	2946	April 27, 2018
Elastic shard corrupted and unassigned Elasticsearch	2	398	October 18, 2019
One of the primary shard of the index is corrupt Elasticsearch	2	227	November 23, 2023

Corrupt index due to missing file

Related topics