Hi,
I have a script to do snapshot process set for a 5.1.2 cluster and it set to snap in every 15 min. This script first snaps if a snap is not already running and once completes successfully , it deletes old snaps from the repo . currently we are keeping last 12 hours of snap in the same repo. Over time the Total snap time has increased from < 15min to > 45 min . and i am seeing a lot of below log messages . Trying to understand what this stack means. Does this mean that this error is actually while deleting the old snaps from the repo or while taking the new snap ? And can you please suggest what could be done to fix this ?
Looking at ES code path - ChecksumBlobStoreFormat.readBlob , seems like its coming while reading the snap file. does it mean that the snap file itself is corrupted ?
[2019-04-29T16:20:31.624Z][WARN][org.elasticsearch.repositories.fs.FsRepository][c908cjr-master-9200] failed to read commit point [snap-FWBZ0G4CQvWLyzUL6opA8Q.dat]
org.elasticsearch.gateway.CorruptStateException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=1542443756 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(ChecksumBlobStoreFormat.readBlob(blob="snap-FWBZ0G4CQvWLyzUL6opA8Q.dat")))
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:114) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.buildBlobStoreIndexShardSnapshots(BlobStoreRepository.java:1172) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.delete(BlobStoreRepository.java:1013) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.delete(BlobStoreRepository.java:959) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:435) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$5(SnapshotsService.java:1132) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.snapshots.SnapshotsService$$Lambda$1573/2137837030.run(Unknown Source) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [elasticsearch-5.1.2.jar:5.1.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=1542443756 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(ChecksumBlobStoreFormat.readBlob(blob="snap-FWBZ0G4CQvWLyzUL6opA8Q.dat")))
at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:499) ~[lucene-core-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:47:11]
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:411) ~[lucene-core-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:47:11]
at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:520) ~[lucene-core-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:47:11]
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:106) ~[elasticsearch-5.1.2.jar:5.1.2]
... 10 more