Elasticsearch index folder contains corrupted file

Recently we found out that some of our indexes are unable to get shards allocated (cluster state remains in Red state as the consequence).
When we check the index files, we found a file named "corrupted_SWN5k349Qx2lZY1fWq0n1g"
The content is as follow:

?×lstore ±"8failed engine (reason: [corrupt file (source: [start])])_codec footer mismatch (file truncated?): actual footer=139953464 vs expected footer=-1071082520^MMapIndexInput(path="/data/nodes/0/indices/EUWYntUMQEOBPeduvbhGgw/0/index/_fe_Lucene50_0.tim") "org.apache.lucene.codecs.CodecUtilCodecUtil.javavalidateFooterö"org.apache.lucene.codecs.CodecUtilCodecUtil.javaretrieveChecksumç7org.apache.lucene.codecs.blocktree.BlockTreeTermsReaderBlockTreeTermsReader.java¡8org.apache.lucene.codecs.lucene50.Lucene50PostingsFormateLucene50PostingsFormat.javafieldsProducer½Eorg.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReaderePerFieldPostingsFormat.java¤8org.apache.lucene.codecs.perfield.PerFieldPostingsFormatePerFieldPostingsFormat.javafieldsProducerô*org.apache.lucene.index.SegmentCoreReadersSegmentCoreReaders.javap%org.apache.lucene.index.SegmentReaderSegmentReader.javaN)org.apache.lucene.index.ReadersAndUpdatesReadersAndUpdates.java getReaderÐ)org.apache.lucene.index.ReadersAndUpdatesReadersAndUpdates.javagetReadOnlyClone‚/org.apache.lucene.index.StandardDirectoryReaderStandardDirectoryReader.javaopeni#org.apache.lucene.index.IndexWriterIndexWriter.java getReaderê'org.apache.lucene.index.DirectoryReaderDirectoryReader.javaopeng'org.apache.lucene.index.DirectoryReaderDirectoryReader.javaopenO-org.elasticsearch.index.engine.InternalEngineInternalEngine.javacreateSearcherManagerÐ-org.elasticsearch.index.engine.InternalEngineInternalEngine.javaÝ4org.elasticsearch.index.engine.InternalEngineFactoryInternalEngineFactory.javanewReadWriteEngine(org.elasticsearch.index.shard.IndexShardIndexShard.java newEngineã(org.elasticsearch.index.shard.IndexShardIndexShard.javacreateNewEngineÑ(org.elasticsearch.index.shard.IndexShardIndexShard.javainternalPerformTranslogRecovery±
(org.elasticsearch.index.shard.IndexShardIndexShard.javaperformTranslogRecovery‰
+org.elasticsearch.index.shard.StoreRecoveryStoreRecovery.javainternalRecoverFromStore…+org.elasticsearch.index.shard.StoreRecoveryStoreRecovery.javalambda$recoverFromStore$0]+org.elasticsearch.index.shard.StoreRecoveryStoreRecovery.javaexecuteRecoveryž+org.elasticsearch.index.shard.StoreRecoveryStoreRecovery.javarecoverFromStore[(org.elasticsearch.index.shard.IndexShardIndexShard.javarecoverFromStore£(org.elasticsearch.index.shard.IndexShardIndexShard.javalambda$startRecovery$5íPorg.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnableThreadContext.javarun¹'java.util.concurrent.ThreadPoolExecutorThreadPoolExecutor.java runWorkerý.java.util.concurrent.ThreadPoolExecutor$WorkerThreadPoolExecutor.javarunðjava.lang.ThreadThread.javarunì %org.elasticsearch.index.engine.EngineEngine.java
failEngineê%org.elasticsearch.index.engine.EngineEngine.javamaybeFailEngine€a-org.elasticsearch.index.engine.InternalEngineInternalEngine.javamaybeFailEngine¹-org.elasticsearch.index.engine.InternalEngineInternalEngine.javacreateSearcherManagerÖ-org.elasticsearch.index.engine.InternalEngineInternalEngine.javaÝ4org.elasticsearch.index.engine.InternalEngineFactoryInternalEngineFactory.javanewReadWriteEngine(org.elasticsearch.index.shard.IndexShardIndexShard.java newEngineã(org.elasticsearch.index.shard.IndexShardIndexShard.javacreateNewEngineÑ(org.elasticsearch.index.shard.IndexShardIndexShard.javainternalPerformTranslogRecovery±
(org.elasticsearch.index.shard.IndexShardIndexShard.javaperformTranslogRecovery‰
+org.elasticsearch.index.shard.StoreRecoveryStoreRecovery.javainternalRecoverFromStore…+org.elasticsearch.index.shard.StoreRecoveryStoreRecovery.javalambda$recoverFromStore$0]+org.elasticsearch.index.shard.StoreRecoveryStoreRecovery.javaexecuteRecoveryž+org.elasticsearch.index.shard.StoreRecoveryStoreRecovery.javarecoverFromStore[(org.elasticsearch.index.shard.IndexShardIndexShard.javarecoverFromStore£(org.elasticsearch.index.shard.IndexShardIndexShard.javalambda$startRecovery$5íPorg.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnableThreadContext.javarun¹'java.util.concurrent.ThreadPoolExecutorThreadPoolExecutor.java runWorkerý.java.util.concurrent.ThreadPoolExecutor$WorkerThreadPoolExecutor.javarunðjava.lang.ThreadThread.javarunì À(“è Ïcä$

Please help if anyone knows:

  1. What might be the possible cause of this corruption?
  2. Is there any way to recover from this corruption?

PS. We are using Elasticsearch 6.0.1

Additional info:
Running /_cluster/allocation/explain?pretty gives the following:

{
  "index" : "check",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "INDEX_REOPENED",
    "at" : "2019-12-27T10:54:47.409Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "FRxaEjtqSROTMnAVdCs4lg",
      "node_name" : "FRxaEjt",
      "transport_address" : "<ip_address>:9300",
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "_4cYlrP4R6ehcW08lOma9g",
        "store_exception" : {
          "type" : "corrupt_index_exception",
          "reason" : "failed engine (reason: [corrupt file (source: [start])]) (resource=preexisting_corruption)",
          "caused_by" : {
            "type" : "i_o_exception",
            "reason" : "failed engine (reason: [corrupt file (source: [start])])",
            "caused_by" : {
              "type" : "corrupt_index_exception",
              "reason" : "codec footer mismatch (file truncated?): actual footer=139953464 vs expected footer=-1071082520 (resource=MMapIndexInput(path=\"/data/nodes/0/indices/EUWYntUMQEOBPeduvbhGgw/0/index/_fe_Lucene50_0.tim\"))"
            }
          }
        }
      }
    }
  ]
}

The file corrupted_SWN5k349Qx2lZY1fWq0n1g is a marker indicating that something else in the folder (_fe_Lucene50_0.tim) was discovered to be corrupt.

The corruption in _fe_Lucene50_0.tim indicates that the content of this file is now different from the content that Elasticsearch wrote and is likely due to a problem with your storage system (disk, filesystem etc.)

Normally Elasticsearch would recover automatically using a good copy on a different node, but you only seem to have one data node so that won't work. You can manually recover this index by closing it and restoring it from a snapshot.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.