Snapshot to S3 partially failed with error Unable to execute HTTP request: <bucket_name>.s3.us-west-2.amazonaws.com

20 out of 90 shards failed to complete the snapshot with the following archive output.

{
    "snapshot": {
        "snapshot": "exabeam-2021.08.10",
        "uuid": "fRjaWFFfQDK7bHbJRR0yHw",
        "version_id": 6080199,
        "version": "6.8.1",
        "indices": [
            "exabeam-2021.08.10"
        ],
        "include_global_state": false,
        "state": "PARTIAL",
        "start_time": "2021-10-23T07:37:31.934Z",
        "start_time_in_millis": 1634974651934,
        "end_time": "2021-10-23T11:26:01.069Z",
        "end_time_in_millis": 1634988361069,
        "duration_in_millis": 13709135,
        "failures": [
            {
                "index": "exabeam-2021.08.10",
                "index_uuid": "exabeam-2021.08.10",
                "shard_id": 44,
                "reason": "IndexShardSnapshotFailedException[com.amazonaws.SdkClientException: Unable to execute HTTP request: <bucketname>.s3.us-west-2.amazonaws.com]; nested: SdkClientException[Unable to execute HTTP request: <bucketname>.s3.us-west-2.amazonaws.com]; nested: UnknownHostException[<bucketname>.s3.us-west-2.amazonaws.com]; ",
                "node_id": "1cF4j6rdR8KKf9o6hgfZ9Q",
                "status": "INTERNAL_SERVER_ERROR"
            }

It may be an intermittent network connection issue since other shards are done successfully on the same Elasticsearch nodes. I tried to restart the whole ES cluster but the error recurs on a new snapshot.

Did anyone encounter the same and how to resolve it?

Can you check the Elasticsearch logs?

Here is some reference information from the ES log file:

[2021-10-23T08:38:49,975][WARN ][o.e.s.SnapshotShardsService] [host5-1] [[exabeam-2021.08.10][20]][exabeam_snapshot_repo:exabeam-2021.08.10/fRjaWFFfQDK7bHbJRR0yHw] failed to snapshot shardorg.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: com.amazonaws.SdkClientException: Unable to execute HTTP request: .s3.us-west-2.amazonaws.comat org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:848) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:388) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:99) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:334) [elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.1.jar:6.8.1]at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: .s3.us-west-2.amazonaws.comat com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1134) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1080) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:745) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:719) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:701) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:669) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:651) ~[?:?]at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:515) ~[?:?]at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4443) ~[?:?]at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4390) ~[?:?]at com.amazonaws.services.s3.AmazonS3Client.abortMultipartUpload(AmazonS3Client.java:3103) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$executeMultipartUpload$11(S3BlobContainer.java:290) ~[?:?]at org.elasticsearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:57) ~[?:?]at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_144]at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:56) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.executeMultipartUpload(S3BlobContainer.java:290) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$writeBlob$2(S3BlobContainer.java:102) ~[?:?]at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_144]at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedIOException(SocketAccess.java:48) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlob(S3BlobContainer.java:98) ~[?:?]at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshotFile(BlobStoreRepository.java:1298) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1234) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:842) ~[elasticsearch-6.8.1.jar:6.8.1]... 8 moreat org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedIOException(SocketAccess.java:48) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlob(S3BlobContainer.java:98) ~[?:?]at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshotFile(BlobStoreRepository.java:1298) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1234) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:842) ~[elasticsearch-6.8.1.jar:6.8.1]... 8 moreCaused by: java.net.UnknownHostException: .s3.us-west-2.amazonaws.comat java.net.InetAddress.getAllByName0(InetAddress.java:1280) ~[?:1.8.0_144]at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[?:1.8.0_144]at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[?:1.8.0_144]at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) ~[?:?]at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) ~[?:?]at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:111) ~[?:?]at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) ~[?:?]at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) ~[?:?]at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144]at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) ~[?:?]at com.amazonaws.http.conn.$Proxy34.connect(Unknown Source) ~[?:?]at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) ~[?:?]at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) ~[?:?]at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[?:?]at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[?:?]at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[?:?]at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[?:?]at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1256) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1072) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:745) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:719) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:701) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:669) ~[?:?]at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:651) ~[?:?]at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:515) ~[?:?]at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4443) ~[?:?]at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4390) ~[?:?]at com.amazonaws.services.s3.AmazonS3Client.abortMultipartUpload(AmazonS3Client.java:3103) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$executeMultipartUpload$11(S3BlobContainer.java:290) ~[?:?]at org.elasticsearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:57) ~[?:?]at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_144]at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:56) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.executeMultipartUpload(S3BlobContainer.java:290) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$writeBlob$2(S3BlobContainer.java:102) ~[?:?]at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_144]at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedIOException(SocketAccess.java:48) ~[?:?]at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlob(S3BlobContainer.java:98) ~[?:?]at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshotFile(BlobStoreRepository.java:1298) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1234) ~[elasticsearch-6.8.1.jar:6.8.1]at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:842) ~[elasticsearch-6.8.1.jar:6.8.1]... 8 more

Perhaps it's an intermittent DNS issue?

The index is large (~1.4tb), and it seems to be a performance/scalability issue. However, what I did not understand was there were some successful snapshots on the same ES nodes, and S3 should be reliable to store this amount of data.

The snapshot status is as below:

{
  "snapshots" : [
    {
      "snapshot" : "exabeam-2021.08.10",
      "repository" : "exabeam_snapshot_repo",
      "uuid" : "fRjaWFFfQDK7bHbJRR0yHw",
      "state" : "SUCCESS",
      "include_global_state" : false,
      "shards_stats" : {
        "initializing" : 0,
        "started" : 0,
        "finalizing" : 0,
        "done" : 69,
        "failed" : 21,
        "total" : 90
      },
      "stats" : {
        "incremental" : {
          "file_count" : 1339,
          "size_in_bytes" : 1407627500247
        },
        "total" : {
          "file_count" : 1339,
          "size_in_bytes" : 1407627500247
        },
        "start_time_in_millis" : 1634974652508,
        "time_in_millis" : 11770904,
        "number_of_files" : 1339,
        "processed_files" : 1339,
        "total_size_in_bytes" : 1407627500247,
        "processed_size_in_bytes" : 1407627500247
      },
      "indices" : {
        "exabeam-2021.08.10" : {
          "shards_stats" : {
            "initializing" : 0,
            "started" : 0,
            "finalizing" : 0,
            "done" : 69,
            "failed" : 21,
            "total" : 90
          },
          "stats" : {
            "incremental" : {
              "file_count" : 1339,
              "size_in_bytes" : 1407627500247
            },
            "total" : {
              "file_count" : 1339,
              "size_in_bytes" : 1407627500247
            },
            "start_time_in_millis" : 1634974652508,
            "time_in_millis" : 11770904,
            "number_of_files" : 1339,
            "processed_files" : 1339,
            "total_size_in_bytes" : 1407627500247,
            "processed_size_in_bytes" : 1407627500247
          },
          "shards" : {
            "0" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 15,
                  "size_in_bytes" : 25894495780
                },
                "total" : {
                  "file_count" : 15,
                  "size_in_bytes" : 25894495780
                },
                "start_time_in_millis" : 1634974652523,
                "time_in_millis" : 13683609,
                "number_of_files" : 15,
                "processed_files" : 15,
                "total_size_in_bytes" : 25894495780,
                "processed_size_in_bytes" : 25894495780
              }
            },
            "1" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 18,
                  "size_in_bytes" : 25894217032
                },
                "total" : {
                  "file_count" : 18,
                  "size_in_bytes" : 25894217032
                },
                "start_time_in_millis" : 1634974652530,
                "time_in_millis" : 13699525,
                "number_of_files" : 18,
                "processed_files" : 18,
                "total_size_in_bytes" : 25894217032,
                "processed_size_in_bytes" : 25894217032
              }
            },
            "2" : {
              "stage" : "FAILURE",
              "stats" : {
                "incremental" : {
                  "file_count" : 0,
                  "size_in_bytes" : 0
                },
                "total" : {
                  "file_count" : 0,
                  "size_in_bytes" : 0
                },
                "start_time_in_millis" : 0,
                "time_in_millis" : 0,
                "number_of_files" : 0,
                "processed_files" : 0,
                "total_size_in_bytes" : 0,
                "processed_size_in_bytes" : 0
              },
              "reason" : "IndexShardSnapshotFailedException[com.amazonaws.SdkClientException: Unable to execute HTTP request: s3archivingtest.s3.us-west-2.amazonaws.com]; nested: SdkClientException[Unable to execute HTTP request: s3archivingtest.s3.us-west-2.amazonaws.com]; nested: UnknownHostException[s3archivingtest.s3.us-west-2.amazonaws.com]; "
            },
            "3" : {
              "stage" : "DONE",
              "stats" : {
                "incremental" : {
                  "file_count" : 18,
                  "size_in_bytes" : 25908855448
                },
                "total" : {
                  "file_count" : 18,
                  "size_in_bytes" : 25908855448
                },
                "start_time_in_millis" : 1634974652512,
                "time_in_millis" : 13696316,
                "number_of_files" : 18,
                "processed_files" : 18,
                "total_size_in_bytes" : 25908855448,
                "processed_size_in_bytes" : 25908855448
              }
            },
    ...

Side note, your shard size seems to be about 15GB? You might want to triple that. I don't think it'll solve this, but it'll improve resource efficiency.

1 Like

Perhaps it's an intermittent DNS issue?

A retry may solve the issue. I will run it again on the same index and update the status. Any other clues?

1 Like

Retrying did solve this issue. Thanks for the help!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.