Registering S3 repository from private subnet fails with "Unauthorized"

I'm trying to register an S3 repository for a test cluster of two instances running Elasticsearch 8.12.0 in a private AWS subnet using IAM instance profiles instead of access keys. The subnet security group has outgoing internet access, and the aws s3 ls <bucket_name> command works from the host. The following command fails:

PUT _snapshot/s3-prod-elasticsearch-snapshots
{
  "type": "s3",
  "settings": {
    "bucket": "company-name-prod-elasticsearch-snapshot-8aobgr3w"
  }
}

Response:

{
  "error": {
    "root_cause": [
      {
        "type": "repository_verification_exception",
        "reason": "[s3-prod-elasticsearch-snapshots] path  is not accessible on master node"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[s3-prod-elasticsearch-snapshots] path  is not accessible on master node",
    "caused_by": {
      "type": "i_o_exception",
      "reason": "Unable to upload object [tests-e27dT1gwQJuSkXKQp-pr5g/master.dat] using a single upload",
      "caused_by": {
        "type": "amazon_service_exception",
        "reason": "Unauthorized (Service: null; Status Code: 401; Error Code: null; Request ID: null; Proxy: null)"
      }
    }
  },
  "status": 500
}

The instance profile role has the following policy:

{
    "Statement": [
        {
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::company-name-prod-elasticsearch-snapshot-pbvc8t7w/*",
                "arn:aws:s3:::company-name-prod-elasticsearch-snapshot-pbvc8t7w"
            ]
        }
    ],
    "Version": "2012-10-17"
}

I thought it might be a networking issue and also tried creating an S3 gateway endpoint and associated routes, and again successfully verified it works with aws s3 ls --endpoint-url https://s3.eu-west-1.amazonaws.com company-name-prod-elasticsearch-snapshot-pbvc8t7w. But the following two commands both fail with the same error as above:

PUT _snapshot/s3-prod-elasticsearch-snapshots
{
  "type": "s3",
  "settings": {
    "bucket": "company-name-prod-elasticsearch-snapshot-8aobgr3w",
    "endpoint": "s3.eu-west-1.amazonaws.com"
  }
}

PUT _snapshot/s3-prod-elasticsearch-snapshots
{
  "type": "s3",
  "settings": {
    "bucket": "company-name-prod-elasticsearch-snapshot-8aobgr3w",
    "endpoint": "s3.eu-west-1.amazonaws.com",
    "server_side_encryption": "true"
  }
}

elasticsearch.yml seems to contain nothing of interest:

cluster.name: 'prod-cluster'
node.name: 'elasticsearch-1'
network.bind_host: 0.0.0.0
network.publish_host: 10.0.1.50
http.port: 9200
transport.port: 9300
s3.client.default.endpoint: s3.eu-west-1.amazonaws.com
discovery.seed_hosts: '10.0.1.223'
cluster.initial_master_nodes:
  - 10.0.1.50
  - 10.0.1.223
xpack.license.self_generated.type: basic
xpack.monitoring.collection.enabled: true
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate 
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.certificate_authorities: /usr/share/elasticsearch/config/certificates/ca/ca.crt
xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/certificates/elasticsearch-1/elasticsearch-1.crt
xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/certificates/elasticsearch-1/elasticsearch-1.key

What am I missing here, how can I get more debug information?

Prettified ES logs when attempting to register look like this:


{
    "@timestamp": "2024-02-05T11:40:40.033Z",
    "log.level": "WARN",
    "message": "path: /_snapshot/s3-prod-elasticsearch-snapshots, params: {pretty=true, repository=s3-prod-elasticsearch-snapshots}, status: 500",
    "ecs.version": "1.2.0",
    "service.name": "ES_ECS",
    "event.dataset": "elasticsearch.server",
    "process.thread.name": "elasticsearch[elasticsearch-1][snapshot][T#1]",
    "log.logger": "rest.suppressed",
    "trace.id": "9f068453b484ef8b4d29927a790fb969",
    "elasticsearch.cluster.uuid": "lB2U_g1GTi-RGAbh6O5gbA",
    "elasticsearch.node.id": "mVuibVsMSIep5xhd_kkbsg",
    "elasticsearch.node.name": "elasticsearch-1",
    "elasticsearch.cluster.name": "prod-cluster",
    "error.type": "org.elasticsearch.repositories.RepositoryVerificationException",
    "error.message": "[s3-prod-elasticsearch-snapshots] path  is not accessible on master node",
    "error.stack_trace": "org.elasticsearch.repositories.RepositoryVerificationException: [s3-prod-elasticsearch-snapshots] path  is not accessible on master node
    Caused by: java.io.IOException: Unable to upload object [tests-VCUCRktrQmm-azZOg9nSrQ/master.dat] using a single upload
        at org.elasticsearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:440)
        at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$writeBlob$1(S3BlobContainer.java:136)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:571)
        at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedIOException(SocketAccess.java:37)
        at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlob(S3BlobContainer.java:134)
        at org.elasticsearch.server@8.12.0/org.elasticsearch.common.blobstore.BlobContainer.writeBlob(BlobContainer.java:121)
        at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlobAtomic(S3BlobContainer.java:279)
        at org.elasticsearch.server@8.12.0/org.elasticsearch.repositories.blobstore.BlobStoreRepository.startVerification(BlobStoreRepository.java:2004)
        at org.elasticsearch.server@8.12.0/org.elasticsearch.repositories.RepositoriesService$4.doRun(RepositoriesService.java:499)
        at org.elasticsearch.server@8.12.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
        at org.elasticsearch.server@8.12.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
    Caused by: com.amazonaws.AmazonServiceException: Unauthorized (Service: null; Status Code: 401; Error Code: null; Request ID: null; Proxy: null)
        at com.amazonaws.internal.EC2ResourceFetcher.handleErrorResponse(EC2ResourceFetcher.java:149)
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:94)
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:70)
        at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.readResource(InstanceMetadataServiceResourceFetcher.java:75)
        at com.amazonaws.internal.EC2ResourceFetcher.readResource(EC2ResourceFetcher.java:66)
        at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsEndpoint(InstanceMetadataServiceCredentialsFetcher.java:60)
        at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsResponse(InstanceMetadataServiceCredentialsFetcher.java:48)
        at com.amazonaws.auth.BaseCredentialsFetcher.fetchCredentials(BaseCredentialsFetcher.java:124)
        at com.amazonaws.auth.BaseCredentialsFetcher.getCredentials(BaseCredentialsFetcher.java:80)
        at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:166)
        at com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper.getCredentials(EC2ContainerCredentialsProviderWrapper.java:75)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:319)
        at org.elasticsearch.repositories.s3.SocketAccess.doPrivileged(SocketAccess.java:31)
        at org.elasticsearch.repositories.s3.S3Service$PrivilegedAWSCredentialsProvider.getCredentials(S3Service.java:300)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
        at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:421)
        at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:6531)
        at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1861)
        at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1821)
        at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$executeSingleUpload$16(S3BlobContainer.java:438)
        at org.elasticsearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:46)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:319)
        at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:45)
        at org.elasticsearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:438)
        ... 13 more"
}

This means Elasticsearch is getting a 401 Unauthorized while trying to retrieve from the instance metadata service the credentials to use for talking to S3.

Are you using IMDSv2? Does it still have this problem with IMDSv1? I'm not sure whether IMDSv2 is supported by repository-s3 or not.

1 Like

Yes, this was it! Setting metadata_options { http_tokens = "optional" } in Terraform immediately resolved the issue. Thanks a ton, I was really pulling my hair out over this one.

Where can I watch to see when this functionality becomes available in repository-s3?

I think you're the first person to encounter this, or at least I don't see an issue about it on Github today, so I opened one here:

1 Like