If a shard fails with no replica, is the data lost entirely?

jtalmi · September 11, 2018, 2:57am

Hi,

I have a single-shard index with no replicas running on a single-node cluster. The shard has apparently failed, and while I'm not sure why right now, I want to know if there are any options for data recovery.

{
  "index" : "twitter",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2018-09-11T02:49:41.476Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [CxXWE8BiQbS4ThB9AvvGQA]: failed recovery, failure RecoveryFailedException[[twitter][0]: Recovery failed on {node-1}{CxXWE8BiQbS4ThB9AvvGQA}{8tah3WOuSlSKhQnqqmV2aQ}{10.142.0.2}{10.142.0.2:9300}{ml.machine_memory=3872485376, ml.max_open_jobs=20, ml.enabled=true}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: CorruptIndexException[misplaced codec footer (file truncated?): length=0 but footerLength==16 (resource=SimpleFSIndexInput(path=\"/var/lib/elasticsearch/nodes/0/indices/l1VcSQySRmuyFGTBBPjX9g/0/translog/translog-1228.ckp\"))]; ",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
  "node_allocation_decisions" : [
    {
      "node_id" : "CxXWE8BiQbS4ThB9AvvGQA",
      "node_name" : "node-1",
      "transport_address" : "10.142.0.2:9300",
      "node_attributes" : {
        "ml.machine_memory" : "3872485376",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "gxegPAMyQa21MH5NxQEACw"
      },
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2018-09-11T02:49:41.476Z], failed_attempts[5], delayed=false, details[failed shard on node [CxXWE8BiQbS4ThB9AvvGQA]: failed recovery, failure RecoveryFailedException[[twitter][0]: Recovery failed on {node-1}{CxXWE8BiQbS4ThB9AvvGQA}{8tah3WOuSlSKhQnqqmV2aQ}{10.142.0.2}{10.142.0.2:9300}{ml.machine_memory=3872485376, ml.max_open_jobs=20, ml.enabled=true}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: CorruptIndexException[misplaced codec footer (file truncated?): length=0 but footerLength==16 (resource=SimpleFSIndexInput(path=\"/var/lib/elasticsearch/nodes/0/indices/l1VcSQySRmuyFGTBBPjX9g/0/translog/translog-1228.ckp\"))]; ], allocation_status[deciders_no]]]"
        }
      ]
    }
  ]
}

Running the rerouting command manually doesn't work. I've seen some tutorials suggesting fixing the index through lucene directly, i.e.:

java -cp lucene-core*.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /opt/elasticsearch-data/elastic_search/nodes/0/indices/my_index/2/index/ -fix

but when I run CheckIndex, it says my index is fine.

system · October 9, 2018, 2:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards failure - recovery possible? Elasticsearch	7	3634	June 6, 2020
Shard data is missing without any reason or log Elasticsearch	2	462	January 15, 2019
Replicate Data Elasticsearch	6	1098	September 28, 2017
total_shards_per_node and node failure Elasticsearch	4	668	December 5, 2012
Cannot get failed shard back online Elasticsearch	3	941	July 6, 2021

If a shard fails with no replica, is the data lost entirely?

Related topics