The primary shard is unassigned

Hello,
My es cluster health status turned to be red because of two unsigined shards.One of them is the primary shard and I got the following error when I executed "/ cluster/allocation/explain",the other is the replication of this primary.

Sorry about that I can't recurrence the problem so I can only provide the previous screenshot of this error.

Then I checked the log file of es and found the following message:

[2023-09-06T19:25:30,886][INFO ][o.e.c.r.a.AllocationService] [bsa04_1] Cluster health status changed from [YELLOW] to [RED] (reason: [shards failed [[bsa_traffic-20230906-3][0]]]).
[2023-09-06T19:25:39,936][INFO ][o.e.c.r.a.AllocationService] [bsa04_1] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[bsa_traffic-20230906-3][0]]]).
[2023-09-06T19:25:46,531][WARN ][o.e.c.r.a.AllocationService] [bsa04_1] failing shard [failed shard, shard [bsa_traffic-20230906-3][0], node[1Pk0_xS-R96rLbH-4MxC9A], [P], s[STARTED], a[id=XEba2p2lT82a-G7huwZF-w], message [shard failure, reason [refresh failed source[peer-recovery]]], failure [CorruptIndexException[compound sub-files must have a valid codec header and footer: codec header mismatch: actual header=0 vs expected header=1071082519 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/home/sdaa/elasticsearch/data/nodes/0/indices/7S-gUUd1TXG5xjb5an-0dw/0/index/_0.fdt")))]], markAsStale [true]]
org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: codec header mismatch: actual header=0 vs expected header=1071082519 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/home/sdaa/elasticsearch/data/nodes/0/indices/7S-gUUd1TXG5xjb5an-0dw/0/index/_0.fdt")))
	at org.apache.lucene.codecs.CodecUtil.verifyAndCopyIndexHeader(CodecUtil.java:287) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.write(Lucene50CompoundFormat.java:92) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:5014) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:574) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:513) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:494) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:297) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:165) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:66) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:40) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-8.3.0.jar:8.3.0 2aa586909b911e66e1d8863aa89f173d69f86cd2 - ishan - 2019-10-25 23:10:03]
	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1603) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.index.engine.InternalEngine.refreshIfNeeded(InternalEngine.java:2744) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.index.engine.InternalEngine.newChangesSnapshot(InternalEngine.java:2616) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.index.engine.InternalEngine.estimateNumberOfHistoryOperations(InternalEngine.java:559) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.index.shard.IndexShard.estimateNumberOfHistoryOperations(IndexShard.java:1949) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:246) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:121) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:56) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:127) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:124) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.5.2.jar:7.5.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.5.2.jar:7.5.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_211]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
[2023-09-06T19:25:47,115][INFO ][o.e.c.r.a.AllocationService] [bsa04_1] Cluster health status changed from [YELLOW] to [RED] (reason: [shards failed [[bsa_traffic-20230906-3][0], [bsa_traffic-20230906-3][0]]]).

I don't really know the root reason of this error.

It was an emergency and the only solution I could think of was to allocate empty primary,but this caused to lose some data. So we are still looking for the better solution and we really want to know the root reason of this error.

Could someone help to confirm why the error is coming

See these docs for more information:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.