Elasticsearch is in red state: class org.apache.lucene.store.BufferedChecksumIndexInput cannot seek backwards

Hi, We are using Elasticsearch 5.2 on a 3 node cluster. Today, ES went into red state and it does not seem to be recovering from it. Can someone please take a look.
What could have caused this state? Do we have to reroute shard manually, with possibly data loss.

{
"index" : "cfileindex",
"shard" : 8,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2017-06-25T11:15:18.550Z",
"failed_allocation_attempts" : 5,
"details" : "failed recovery, failure RecoveryFailedException[[cfileindex][8]: Recovery failed on {14038005774639}{ui1ui89bSzmlBCQ0qD7SiQ}{f-LtGPOSSk2X-lljpAL0DA}{10.2.32.169}{10.2.32.169:25800}]; nested: IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[class org.apache.lucene.store.BufferedChecksumIndexInput cannot seek backwards (pos=-16 getFilePointer()=0)]; ",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
"node_allocation_decisions" : [
{
"node_id" : "ui1ui89bSzmlBCQ0qD7SiQ",
"node_name" : "14038005774639",
"transport_address" : "10.2.32.169:25800",
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "sQWGfvJIRwqWn2eNjVn3pg"
},
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2017-06-25T11:15:18.550Z], failed_attempts[5], delayed=false, details[failed recovery, failure RecoveryFailedException[[cfileindex][8]: Recovery failed on {14038005774639}{ui1ui89bSzmlBCQ0qD7SiQ}{f-LtGPOSSk2X-lljpAL0DA}{10.2.32.169}{10.2.32.169:25800}]; nested: IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[class org.apache.lucene.store.BufferedChecksumIndexInput cannot seek backwards (pos=-16 getFilePointer()=0)]; ], allocation_status[deciders_no]]]"
}
]
}
]
}

========================================
Information in log files.

[2017-06-28T00:23:27,554][WARN ][o.e.c.a.s.ShardStateAction] [14038005774639] [cfileindex][6] received shard failed for shard id [[cfileindex][6]], allocation id [zT9iw10oRrmasz-icwRqFA], primary term [578], message [mark copy as stale]
[2017-06-28T00:23:27,568][WARN ][o.e.c.a.s.ShardStateAction] [14038005774639] [cfileindex][9] received shard failed for shard id [[cfileindex][9]], allocation id [WFJT1sjzT9-dQiREAD8rXA], primary term [573], message [mark copy as stale]
[2017-06-28T00:23:27,645][WARN ][o.e.c.a.s.ShardStateAction] [14038005774639] [cfileindex][4] received shard failed for shard id [[cfileindex][4]], allocation id [WFRVAMQ_SLCxisSkjbTOTQ], primary term [573], message [mark copy as stale]
[2017-06-28T00:23:27,682][WARN ][o.e.c.a.s.ShardStateAction] [14038005774639] [cfileindex][0] received shard failed for shard id [[cfileindex][0]], allocation id [Wdrl7VAaSy2_OXPUuJBeVQ], primary term [554], message [mark copy as stale]
[2017-06-28T00:23:27,778][WARN ][o.e.c.a.s.ShardStateAction] [14038005774639] [cfileindex][7] received shard failed for shard id [[cfileindex][7]], allocation id [N5MphZcWRq63voJ7_ZhzEA], primary term [571], message [mark copy as stale]
[2017-06-28T00:23:43,843][INFO ][o.e.c.r.DelayedAllocationService] [14038005774639] scheduling reroute for delayed shards in [27.4s] (13 delayed shards)
[2017-06-28T00:36:04,617][DEBUG][o.e.a.a.c.a.TransportClusterAllocationExplainAction] [14038005774639] explaining the allocation for [ClusterAllocationExplainRequest[index=cfileindex,shard=0,primary?=true,includeYesDecisions?=false], found shard [[cfileindex][0], node[ui1ui89bSzmlBCQ0qD7SiQ], [P], s[STARTED], a[id=4hc6JCLFSM2R1IAHU_y2vg]]
[2017-06-28T00:36:41,031][DEBUG][o.e.a.a.c.a.TransportClusterAllocationExplainAction] [14038005774639] explaining the allocation for [ClusterAllocationExplainRequest[index=cfileindex,shard=8,primary?=true,includeYesDecisions?=false], found shard [[cfileindex][8], node[null], [P], recovery_source[existing recovery], s[UNASSIGNED], unassigned_info[[reason=ALLOCATION_FAILED], at[2017-06-25T11:15:18.550Z], failed_attempts[5], delayed=false, details[failed recovery, failure RecoveryFailedException[[cfileindex][8]: Recovery failed on {14038005774639}{ui1ui89bSzmlBCQ0qD7SiQ}{f-LtGPOSSk2X-lljpAL0DA}{10.2.32.169}{10.2.32.169:25800}]; nested: IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[class org.apache.lucene.store.BufferedChecksumIndexInput cannot seek backwards (pos=-16 getFilePointer()=0)]; ], allocation_status[deciders_no]]]
[2017-06-28T00:37:41,122][DEBUG][o.e.a.a.c.a.TransportClusterAllocationExplainAction] [14038005774639] explaining the allocation for [ClusterAllocationExplainRequest[index=cauditindex,shard=1,primary?=true,includeYesDecisions?=false], found shard [[cauditindex][1], node[null], [P], recovery_source[existing recovery], s[UNASSIGNED], unassigned_info[[reason=ALLOCATION_FAILED], at[2017-06-25T11:15:18.550Z], failed_attempts[5], delayed=false, details[failed recovery, failure RecoveryFailedException[[cauditindex][1]: Recovery failed on {14038005774639}{ui1ui89bSzmlBCQ0qD7SiQ}{f-LtGPOSSk2X-lljpAL0DA}{10.2.32.169}{10.2.32.169:25800}]; nested: IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[class org.apache.lucene.store.BufferedChecksumIndexInput cannot seek backwards (pos=-16 getFilePointer()=0)]; ], allocation_status[deciders_no]]]
[2017-06-28T00:41:26,201][INFO ][o.e.c.s.ClusterService ] [14038005774639] removed {{14038005774618}{uI-QPxa6TWmUUTE-xoRc7Q}{X_Pnry62Q8aCwz_9kIWscQ}{10.2.32.165}{10.2.32.165:25800},}, reason: zen-disco-node-left({14038005774618}{uI-QPxa6TWmUUTE-xoRc7Q}{X_Pnry62Q8aCwz_9kIWscQ}{10.2.32.165}{10.2.32.165:25800}), reason(left)[{14038005774618}{uI-QPxa6TWmUUTE-xoRc7Q}{X_Pnry62Q8aCwz_9kIWscQ}{10.2.32.165}{10.2.32.165:25800} left]
[2017-06-28T00:41:26,214][INFO ][o.e.c.r.DelayedAllocationService] [14038005774639] scheduling reroute for delayed shards in [59.9s] (14 delayed shards)
[2017-06-28T00:41:27,054][WARN ][o.e.c.a.s.ShardStateAction] [14038005774639] [cfileindex][2] received shard failed for shard id [[cfileindex][2]], allocation id [Mzm_bj5RTe-4fNXqAzk1oQ], primary term [565], message [mark copy as stale]
[2017-06-28T00:41:27,054][WARN ][o.e.c.a.s.ShardStateAction] [14038005774639] [cfileindex][1] received shard failed for shard id [[cfileindex][1]], allocation id [yqAH7hoOQz2lQUJnExpAJw], primary term [557], message [mark copy as stale]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.