We've a 6.8 elastisearch cluster using the gcp plugin to store snapshots running on a cron.
We've been running a cron to periodically create snapshots of our elasticseearch 6.8
cluster using:
http://...../_snapshot/index-09-09-2020?wait_for_complatetion=true
It's been working for a 1 year+ when we've started noticing snapshots failing from IOExceptions.
"reason": "[gcp_snapshots] could not read repository data from index blob",
"caused_by": {
"type": "i_o_exception",
"reason": "Unexpected end-of-input in VALUE_STRING\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@2f6768e6; line: 1, column: 2098305]",
"stack_trace": "java.io.IOException: Unexpected end-of-input in VALUE_STRING\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@2f6768e6; line: 1, column: 2098305]\n\tat com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:483)\n\tat com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:460)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._loadMoreGuaranteed(UTF8StreamJsonParser.java:2404)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2489)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2469)\n\tat com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:315)\n\tat org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:83)\n\tat org.elasticsearch.repositories.RepositoryData.snapshotsFromXContent(RepositoryData.java:368)\n\tat org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:662)\n\tat org.elasticsearch.snapshots.SnapshotsService.getRepositoryData(SnapshotsService.java:155)\n\tat org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:97)\n\tat org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:55)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:124)\n\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:211)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.lang.Thread.run(Thread.java:835)\n"
managed to snag a stack trace but nothing is really clear about what's causing it to failing now. has our backup become corrupted?
I've verified that the repository is available on all nodes