Cannot backup Elastic to Google Cloud Storage

Hello,

I'm using ElasticSearch 6.4.3 and I'm trying to perform a cluster backup to Google Cloud Storage. I had installed the plugin describe in this link https://www.elastic.co/guide/en/elasticsearch/plugins/6.4/repository-gcs.html.

My backup has more or less 4TB of data.

When I take a snapshot, after a few hours I always get the following error:

[2019-11-18T15:00:24,394][WARN ][o.e.s.SnapshotShardsService] [node3] [[index_name][0]][gcs_repository:snapshot-2019-11-18/JDvfALu6Qree0sOLnsZQ1Q] failed to snapshot shard
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: com.google.cloud.storage.StorageException: Error writing request body to server
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:858) ~[elasticsearch-6.4.3.jar:6.4.3]
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:410) ~[elasticsearch-6.4.3.jar:6.4.3]
at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:97) ~[elasticsearch-6.4.3.jar:6.4.3]
at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:354) [elasticsearch-6.4.3.jar:6.4.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [elasticsearch-6.4.3.jar:6.4.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.3.jar:6.4.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: com.google.cloud.storage.StorageException: Error writing request body to server
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:220) ~[?:?]
at com.google.cloud.storage.spi.v1.HttpStorageRpc.write(HttpStorageRpc.java:703) ~[?:?]
at com.google.cloud.storage.BlobWriteChannel$1.run(BlobWriteChannel.java:51) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_191]
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89) ~[?:?]
at com.google.cloud.RetryHelper.run(RetryHelper.java:74) ~[?:?]
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51) ~[?:?]
at com.google.cloud.storage.BlobWriteChannel.flushBuffer(BlobWriteChannel.java:47) ~[?:?]
at com.google.cloud.BaseWriteChannel.flush(BaseWriteChannel.java:122) ~[?:?]
at com.google.cloud.BaseWriteChannel.write(BaseWriteChannel.java:149) ~[?:?]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore$2.lambda$write$0(GoogleCloudStorageBlobStore.java:238) ~[?:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191]
at org.elasticsearch.repositories.gcs.SocketAccess.doPrivilegedIOException(SocketAccess.java:44) ~[?:?]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore$2.write(GoogleCloudStorageBlobStore.java:238) ~[?:?]
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_191]
at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_191]
at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_191]
at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_191]
at org.elasticsearch.core.internal.io.Streams.copy(Streams.java:55) ~[elasticsearch-core-6.4.3.jar:6.4.3]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.writeBlobResumable(GoogleCloudStorageBlobStore.java:224) ~[?:?]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.writeBlob(GoogleCloudStorageBlobStore.java:203) ~[?:?]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainer.writeBlob(GoogleCloudStorageBlobContainer.java:68) ~[?:?]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshotFile(BlobStoreRepository.java:1331) ~[elasticsearch-6.4.3.jar:6.4.3]
Caused by: java.io.IOException: Error writing request body to server
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) ~[?:?]
at com.google.api.client.util.ByteStreams.copy(ByteStreams.java:55) ~[?:?]
at com.google.api.client.util.IOUtils.copy(IOUtils.java:94) ~[?:?]
at com.google.api.client.http.AbstractInputStreamContent.writeTo(AbstractInputStreamContent.java:72) ~[?:?]
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:80) ~[?:?]
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) ~[?:?]
at com.google.cloud.storage.spi.v1.HttpStorageRpc.write(HttpStorageRpc.java:684) ~[?:?]
at com.google.cloud.storage.BlobWriteChannel$1.run(BlobWriteChannel.java:51) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_191]
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89) ~[?:?]
at com.google.cloud.RetryHelper.run(RetryHelper.java:74) ~[?:?]
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51) ~[?:?]
at com.google.cloud.storage.BlobWriteChannel.flushBuffer(BlobWriteChannel.java:47) ~[?:?]
at com.google.cloud.BaseWriteChannel.flush(BaseWriteChannel.java:122) ~[?:?]
at com.google.cloud.BaseWriteChannel.write(BaseWriteChannel.java:149) ~[?:?]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore$2.lambda$write$0(GoogleCloudStorageBlobStore.java:238) ~[?:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_191]
at org.elasticsearch.repositories.gcs.SocketAccess.doPrivilegedIOException(SocketAccess.java:44) ~[?:?]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore$2.write(GoogleCloudStorageBlobStore.java:238) ~[?:?]
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_191]
at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_191]
at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_191]
at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_191]
at org.elasticsearch.core.internal.io.Streams.copy(Streams.java:55) ~[elasticsearch-core-6.4.3.jar:6.4.3]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.writeBlobResumable(GoogleCloudStorageBlobStore.java:224) ~[?:?]
at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.writeBlob(GoogleCloudStorageBlobStore.java:203) ~[?:?]

I'm using the plugin with default settings. Anyone can help me to fix it?

Hi @Victor_Guimaraes

Unfortunately, the issue you're experiencing is a problem with the way GCS fails on some large uploads. The issue with these is that the upload session for a large blob upload may become corrupted at a very low rate. We have recently fixed the issue in https://github.com/elastic/elasticsearch/pull/45963 and the fix will be released in Elasticsearch 7.5 . The best workaround I can offer for the time being is to try snapshot groups of indices separately to lower the chance of running into this issue in a single snapshot as statistically speaking this should only affect very large snapshots.

Thank You @Armin_Braun for your answer. I will try to perform the snapshot per index.