Index Health often red

Good day.
We have Elasticsearch version 7.13.3 on separated server, not a cluster. And often one of the indexes turns red. We are deleting it. But after a while it turns red again.
May be there is a solution to how to transfer the index from the red state to the green one without deleting it. And why does the index turn red?

Why is it turning red? Is there anything in the Elasticsearch logs that explain this?

What type of hardware and storage are you using? What is the use case? What kind of load is the cluster under?

Good day.
That is what i search in logs
[2022-01-17T23:42:48,615][WARN ][o.e.i.e.Engine ] [dtc-srv-elk] [tmg_fw-2022.01.17][0] failed engine [merge failed] org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=7d8c5baf actual=ebeec7ce (resource=BufferedChecksumIndexInpu <------>at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2474) [elasticsearch-7.13.3.jar:7.13.3] <------>at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732) [elasticsearch-7.13.3.jar:7.13.3] <------>at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.13.3.jar:7.13.3] <------>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?] <------>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?] <------>at java.lang.Thread.run(Thread.java:831) [?:?] Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=7d8c5baf actual=ebeec7ce (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/var/lib/elasticse <------>at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:547) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.checkIntegrity(BlockTreeTermsReader.java:349) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 1 <------>at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.checkIntegrity(PerFieldPostingsFormat.java:371) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdro <------>at org.apache.lucene.codecs.perfield.PerFieldMergeState$FilterFieldsProducer.checkIntegrity(PerFieldMergeState.java:271) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdro <------>at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:96) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021- <------>at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89) ~[elasticsearch-7.13.3.jar:7.13.3] <------>at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16: [2022-01-17T23:42:48,659][WARN ][o.e.i.c.IndicesClusterStateService] [dtc-srv-elk] [tmg_fw-2022.01.17][0] marking and sending shard failed due to [shard failure, reason [merge failed]] org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=7d8c5baf actual=ebeec7ce (resource=BufferedChecksumIndexInpu <------>at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2474) ~[elasticsearch-7.13.3.jar:7.13.3] <------>at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732) ~[elasticsearch-7.13.3.jar:7.13.3] <------>at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.13.3.jar:7.13.3] <------>at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?] <------>at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?] <------>at java.lang.Thread.run(Thread.java:831) [?:?] Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=7d8c5baf actual=ebeec7ce (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/var/lib/elasticse <------>at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:547) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.checkIntegrity(BlockTreeTermsReader.java:349) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 1 <------>at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.checkIntegrity(PerFieldPostingsFormat.java:371) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdro <------>at org.apache.lucene.codecs.perfield.PerFieldMergeState$FilterFieldsProducer.checkIntegrity(PerFieldMergeState.java:271) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdro <------>at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:96) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:197) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021- <------>at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:244) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27] <------>at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:89) ~[elasticsearch-7.13.3.jar:7.13.3] <------>at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16: [2022-01-17T23:42:48,665][WARN ][o.e.c.r.a.AllocationService] [dtc-srv-elk] failing shard [failed shard, shard [tmg_fw-2022.01.17][0], node[nQoHwDSqR6yRrwe8yF1AmQ], [P], s[STARTED], a[id=nXOIiEnsQ5qdqveJwiD org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=7d8c5baf actual=ebeec7ce (resource=BufferedChecksumIndexInpu <------>at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2474) ~[elasticsearch-7.13.3.jar:7.13.3] <------>at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732) ~[elasticsearch-7.13.3.jar:7.13.3] <------>at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.13.3.jar:7.13.3]

Harware:
8 CPU, 24 RAM. EXT4 on SHD (SATA). Summary: 1.6T, used: 473G, free: 1.1T (/var). Each index similar 25G

Winlogbeat collects data and sends it to logstash

Elasticsearch installed on one server, not a cluster

CPU load:

The error message indicates index corruption, possibly due to storage problems. Have you checked your storage for issues? Do you have any processes that could interfere with the Elasticsearch index files?

There are no other processes affecting it.
I think the problems with the storage can be removed. We tried to move the index to another partition, the result is the same. We use 4 indexes, and problems seem to arise with two. The file with the index is large for them, up to 50 GB
Could there be anything else besides storage issues? Or can you tell me how to check it correctly?

Have you tried running it against some different type of storage?

Do I need to transfer to another storage or to another type of storage?

Yesterday, the virtual machine was transferred to another storage. The result is the same.
This error i can find in system log
'Jan 20 08:38:50 dtc-srv-elk kernel: [334898.963155] EXT4-fs (dm-0): Delayed block allocation failed for inode 25172412 at logical offset 2048 with max blocks 2048 with error 117
Jan 20 08:38:50 dtc-srv-elk kernel: [334898.966687] EXT4-fs (dm-0): This should not happen!! Data will be lost
Jan 20 08:38:50 dtc-srv-elk kernel: [334898.966687].'
There can't be a problem with the fact that the state of the system is red

Sounds like a potential problem with the storage used. Is this some kind of networked storage?

Those virtual machine moved to another storage, CSV is connected to it. This is not network storage. The OS sees the disk as if it is a physical disk

You were right. There were problems on the disk. After fixing the errors, the indexes no longer switch to the red status.
Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.