One of the primary shard of the index is corrupt

Aayush_Kumar1 · October 26, 2023, 2:25pm

Hi, I am getting this error. Is there any way to restore this shard?

/_cluster/allocation/explain?pretty 
{
  "index" : "es-document-000004",
  "shard" : 2,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2023-03-31T20:10:08.007Z",
    "failed_allocation_attempts" : 4,
    "details" : "failed shard on node [ULQc2PtCRgmAZCeqhYYzNg]: shard failure, reason [corrupt file (source: [start])], failure java.nio.file.FileSystemException: /data/indices/nAINL_GfRy2W91poWjOPBQ/2/index/_jx.fdx: Too many open files\n\tat sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)\n\tat sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\tat sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181)\n\tat java.nio.channels.FileChannel.open(FileChannel.java:298)\n\tat java.nio.channels.FileChannel.open(FileChannel.java:357)\n\tat org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)\n\tat org.elasticsearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:124)\n\tat org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)\n\tat org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)\n\tat org.apache.lucene.codecs.lucene90.compressing.FieldsIndexReader.<init>(FieldsIndexReader.java:68)\n\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.<init>(Lucene90CompressingStoredFieldsReader.java:166)\n\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsReader(Lucene90CompressingStoredFieldsFormat.java:133)\n\tat org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat.fieldsReader(Lucene90StoredFieldsFormat.java:136)\n\tat org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:138)\n\tat org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:91)\n\tat org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:179)\n\tat org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:221)\n\tat org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:535)\n\tat org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:138)\n\tat org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:597)\n\tat org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:112)\n\tat org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:91)\n\tat org.elasticsearch.index.engine.InternalEngine.createReaderManager(InternalEngine.java:617)\n\tat org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:245)\n\tat org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:191)\n\tat org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:14)\n\tat org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1930)\n\tat org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1894)\n\tat org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:461)\n\tat org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:88)\n\tat org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:462)\n\tat org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:86)\n\tat org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2239)\n\tat org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.lang.Thread.run(Thread.java:833)\n\tSuppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (4f805c91). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/data/indices/nAINL_GfRy2W91poWjOPBQ/2/index/_jx.fdm\")))\n\t\tat org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:500)\n\t\tat org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.<init>(Lucene90CompressingStoredFieldsReader.java:208)\n\t\t... 28 more\n",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "Elasticsearch can't allocate this shard because all the copies of its data in the cluster are stale or corrupt. Elasticsearch will allocate this shard when a node containing a good copy of its data joins the cluster. If no such node is available, restore this index from a recent snapshot.",
  "node_allocation_decisions" : [
    {
      "node_id" : "ULQc2PtCRgmAZCeqhYYzNg",
      "node_name" : "data2",
      "transport_address" : "x.x.x.x:9300",
      "node_attributes" : {
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "RGteemkTTDCvVgTehBo0JA",
        "store_exception" : {
          "type" : "corrupt_index_exception",
          "reason" : "failed engine (reason: [corrupt file (source: [start])]) (resource=preexisting_corruption)",
          "caused_by" : {
            "type" : "i_o_exception",
            "reason" : "failed engine (reason: [corrupt file (source: [start])])",
            "caused_by" : {
              "type" : "corrupt_index_exception",
              "reason" : "checksum passed (4f805c91). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/data/indices/nAINL_GfRy2W91poWjOPBQ/2/index/_jx.fdm\")))"
            }
          }
        }
      }
    },
    {
      "node_id" : "wbTIeBzwR1ODyoeTs9nQ2w",
      "node_name" : "data1",
      "transport_address" : "x.x.x.x:9300",
      "node_attributes" : {
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "SrLEjFdsCKR3ucQbvK64jQkg"
      }
    }
  ]
}

sholzhauer · October 26, 2023, 7:04pm

Hi,

You appear to have two options:

Delete the index and restore from snapshot
Have a replica shard and delete the primary shard

However, based on the error, if you have a node with a replica shard the second options should automatically kick-in and auto correct the issue.

Which leaves you with restoring from snapshot if you have one.

You could attempt a restart of the elasticsearch node itself in the hopes a " have you tried turning it off and on again" resolves it, but i wouldn't bet any money on it.

system · November 23, 2023, 7:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Corrupt primary shard, how to recover from replica shard? Elasticsearch	3	237	March 4, 2024
Corrupt index due to missing file Elasticsearch	4	912	September 15, 2021
1 index having two shards went unassigned, when we do the cluster explain we received below response. Request you to assist on this issue Elasticsearch	1	151	November 7, 2023
Elastic shard corrupted and unassigned Elasticsearch	2	398	October 18, 2019
Elasticseach failed shard allocation Elasticsearch	8	1436	May 28, 2021

One of the primary shard of the index is corrupt

Related topics