Shard repeat to be UNASSIGNED

I have the latest index devops-2017.04.28, but cluster health will turn into red status no more than 30 minutes since there're UNASSIGNED shards in this index, then I XDELETE index devops-2017.04.28, cluster state turns into green, but after a while it will be red again because of the same issue.

ES version: 5.3.0
number of client node: 3
number of master/data node: 3

here's the info I can find:
GET _cat/shards:
devops-2017.04.28 1 p STARTED 856594 373.1mb 10.42.191.21 devops-esdata-1
devops-2017.04.28 1 r STARTED 856594 347.2mb 10.42.177.244 devops-esdata-0
devops-2017.04.28 2 p UNASSIGNED
devops-2017.04.28 2 r UNASSIGNED
devops-2017.04.28 0 p STARTED 859183 413mb 10.42.177.244 devops-esdata-0
devops-2017.04.28 0 r STARTED 859193 394mb 10.42.53.138 devops-esdata-2

GET _cat/shards?h=index,shard,prirep,state,unassigned.reason:
devops-2017.04.28 2 p UNASSIGNED ALLOCATION_FAILED
devops-2017.04.28 2 r UNASSIGNED PRIMARY_FAILED
devops-2017.04.28 1 p STARTED
devops-2017.04.28 1 r STARTED
devops-2017.04.28 0 p STARTED
devops-2017.04.28 0 r STARTED

Logs:

[2017-04-28T04:33:53,695][ERROR][o.e.i.e.InternalEngine$EngineMergeScheduler] [devops-esdata-2] [devops-2017.04.28][0] failed to merge
org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=16777217 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/data/nodes/0/indices/BRiSyoI1SYiR74D8xaB11g/0/index/_u3_Lucene50_0.pos")))
	at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:499) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:411) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.write(Lucene50CompoundFormat.java:103) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4924) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4400) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3920) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626) [lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
[2017-04-28T04:33:53,764][WARN ][o.e.i.c.IndicesClusterStateService] [devops-esdata-2] [[devops-2017.04.28][0]] marking and sending shard failed due to [shard failure, reason [merge failed]]
org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=16777217 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/data/nodes/0/indices/BRiSyoI1SYiR74D8xaB11g/0/index/_u3_Lucene50_0.pos")))
	at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:1340) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:613) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.3.0.jar:5.3.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=16777217 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/data/nodes/0/indices/BRiSyoI1SYiR74D8xaB11g/0/index/_u3_Lucene50_0.pos")))
	at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:499) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:411) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.write(Lucene50CompoundFormat.java:103) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4924) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4400) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3920) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]
	at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99) ~[elasticsearch-5.3.0.jar:5.3.0]
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]

Thanks for the help.

What version are you on?

elasticsearch 5.3.0

POST _cluster/reroute

···
"details": """shard failure, reason [refresh failed], failure CorruptIndexException[codec footer mismatch (file truncated?): actual footer=50536199 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/data/nodes/0/indices/bxCxWKeeTJmEVq55Kam7lQ/1/index/_lh6_Lucene50_0.doc")))]""",
···

Is this on the cluster you mentioned is running glusterFS in this other post?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.