Cannot Register Snapshot Repository for S3 compliant (backblaze) service

I've looked at the other tickets for similar issues and have not yet found a solution or even suggestion for a next step.

On ElasticCloud 7.15 I have a repo setup to backblaze and it's working well enough (a little slow but :man_shrugging:)...
On Elasticsearch on-prem 8.15 when I attempt to configure the repo it fails immediately with the following

{
  "error": {
    "root_cause": [
      {
        "type": "repository_verification_exception",
        "reason": "[backblaze] path  is not accessible on master node"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[backblaze] path  is not accessible on master node",
    "caused_by": {
      "type": "i_o_exception",
      "reason": "Unable to upload object [tests-poJcFn-YSBCXBlSMlWqpxw/master.dat] using a single upload",
      "caused_by": {
        "type": "sdk_client_exception",
        "reason": "sdk_client_exception: Failed to connect to service endpoint: ",
        "caused_by": {
          "type": "i_o_exception",
          "reason": "Connect timed out"
        }
      }
    }
  },
  "status": 500
}

The logs say similar StackTrace

[2024-08-26T23:20:02,161][WARN ][r.suppressed             ] [elastic-tiebreaker] path: /_snapshot/backblaze, params: {repository=backblaze}, status: 500
org.elasticsearch.transport.RemoteTransportException: [elastic1][10.10.10.69:9300][cluster:admin/repository/put]
Caused by: org.elasticsearch.repositories.RepositoryVerificationException: [backblaze] path  is not accessible on master node
Caused by: java.io.IOException: Unable to upload object [tests-ZX2GBMIhQFuPJU79lCqUlw/master.dat] using a single upload
        at org.elasticsearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:460) ~[?:?]
        at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$writeBlob$1(S3BlobContainer.java:138) ~[?:?]
        at java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
        at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedIOException(SocketAccess.java:37) ~[?:?]
        at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlob(S3BlobContainer.java:136) ~[?:?]
        at org.elasticsearch.common.blobstore.BlobContainer.writeBlob(BlobContainer.java:123) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlobAtomic(S3BlobContainer.java:298) ~[?:?]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.startVerification(BlobStoreRepository.java:2154) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.repositories.RepositoriesService.lambda$validatePutRepositoryRequest$11(RepositoriesService.java:361) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.action.ActionRunnable$1.doRun(ActionRunnable.java:36) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.15.0.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1570) ~[?:?]
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: sdk_client_exception: Failed to connect to service endpoint:
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100) ~[?:?]
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:70) ~[?:?]
        at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.readResource(InstanceMetadataServiceResourceFetcher.java:75) ~[?:?]
        at com.amazonaws.internal.EC2ResourceFetcher.readResource(EC2ResourceFetcher.java:66) ~[?:?]
        at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsEndpoint(InstanceMetadataServiceCredentialsFetcher.java:60) ~[?:?]
...skipping...

I have followed as many different tutorials I can find. The common steps are

  1. add the appkey and secret to the ES keystore using CLI and use the /_nodes/reload_secure_settings api to sync nodes
  2. use the elasticsearch.yml config to add the client settings (I couldn't find a tutorial that made this totally clear as to what I needed to add
  3. use the snapshot API or kibana to register the repo (both fail immediately)

The JSON I'm using to register in the API (i've used the most simple version with min settings, this is the latest test):

{
  "type": "s3",
  "settings": {
    "bucket": "elasticsearch-onprem",
    "endpoint": "s3.us-west-004.backblazeb2.com",
    "region": "us-west-004",
    "compress": "true",
    "server_side_encryption": "true",
    "client": "default",
    "path_style_access" : "true",
    "protocol": "https",
    "max_retries": "10",
    "read_timeout": "2m"
  }
}

relevant elasticsearch.yml config

s3.client.backblaze.endpoint: "s3.us-west-004.backblazeb2.com"
s3.client.backblaze.path_style_access: true

I've tried using s3.client.default too and there is no difference, all fail.

I reached out to backblaze who suggested increasing the timeout, but when I try to configure the repo it immediately fails and doesn't respect the timeouts/retries

My hunch is it has something to do with authentication. BackBlaze says

Authentication

The S3-Compatible API supports only v4 signatures for authentication, and v2 signatures are not supported at this time. To learn more about S3 authentication, see this article.

Pointing to "AWS Signature Version 4 for API requests"

How can I ensure the JAVA SDK is using v4 signatures?

I think you need "client": "backblaze" here. At least, assuming you set s3.client.backblaze.access_key and s3.client.backblaze.secret_key in your keystore.

Also please read this section of the docs carefully re. non-AWS implementations of the S3 API. I know nothing in particular about Backblaze's offering, but there are lots of other folks that claim to offer a S3-compatible API despite demonstrable incomatibilities. Tread carefully.

Thank you @DavidTurner , I have configured both s3.client.backblaze.* and s3.client.default.* clients while on this path of discovery and failure.

I had created only the backblaze client first (no default) on my 8.15 install and then after this post wondered if my ElasticCloud 7.15 version worked with backblaze client because it was already configured with a default client prior to adding backblaze... just trying a little of anything.

I have read that section (non-aws services)... The weird part is it works in v7.15 so it felt like I could trust backblaze to be enough compliant.

Another super interesting note is a google search for "elasticsearch backblaze plugin" brings me to a github issue about it from 2021:

which points to another related issue:

which points to an issue from just 3weeks ago:

I get these are elastic internal things but seems like if they can do it I should be able to do it. Just like them I wanna migrate more backups to BackBlaze to reduce our costs!

does pgulley or rahulbot ever come hang out on these forums, maybe they might have an insight? I just commented on one of those github issues in the hope there can be more support for backblaze. I hope I didn't overstep any lines.

Are you sure you set these settings in the keystore and not just in elasticsearch.yml? The exception you've shared indicates that ES has not seen any access_key or secret_key settings, and these settings live in the keystore.

@ivanlawrence I'm on the mediacloud team, and did the initial testing with B2. We're using (from Python), repo_type = "s3" and endpoint = "https://s3.BACKBLAZE_REGION.backblazeb2.com" but NOT supplying region (which could trigger some client code to generate an AWS endpoint URL) on the create_repository call: S3 repository | Elasticsearch Guide [8.15] | Elastic

1 Like

Ah yeah that might help indeed.

Thank you @Phil_Budne for responding!

I have just tried both an empty string and removing region from the config, both both have the same failed 500 result:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "repository_verification_exception",
        "reason" : "[backblaze] path  is not accessible on master node"
      }
    ],
    "type" : "repository_verification_exception",
    "reason" : "[backblaze] path  is not accessible on master node",
    "caused_by" : {
      "type" : "i_o_exception",
      "reason" : "Unable to upload object [tests-uwsEWkq3SIy1FdB5mnXcMA/master.dat] using a single upload",
      "caused_by" : {
        "type" : "sdk_client_exception",
        "reason" : "sdk_client_exception: Failed to connect to service endpoint: ",
        "caused_by" : {
          "type" : "i_o_exception",
          "reason" : "Connect timed out"
        }
      }
    }
  },
  "status" : 500
}

It feels like it shouldn't be this hard and that maybe I have missed something else in the setup process... but what exactly is the mystery.

@DavidTurner , thank you for replying.

I just tried twice "region": "" and removing region from the config entirely and there is no change in the response or error messages in /var/log/elasticsearch/elasticsearch.log

Feels like something outside of this config might be an issue? I can confirm all nodes can resolve DNS to the endpoint. I'm going to try and setup rclone on a system and see if I can connect outside of Elasticsearch.

I'm even considering trying the JAVA S3 SDK outside of elasticsearch... but all this feels like something simple must be missing... any guesses as to where I can look next? I guess I could create a throwaway S3 bucket as a sanity check, assuming it won't cost an arm and a leg :wink:

looks like a connectivity problem.
What happens if you run curl on the B2 EP URL from the ES server?

This bit of stack trace...

... says that it's failing while it's trying to use the instance metadata service to get credentials, which means it's not seeing any other credentials. That's why I think you are not setting the access_key and secret_key correctly in the keystore.

This is how I "confirmed DNS resolution"

root@elastic-tiebreaker:~# curl --location 'https://s3.us-west-004.backblazeb2.com/' -i
HTTP/1.1 403
Server: nginx
Date: Tue, 27 Aug 2024 17:08:38 GMT
Content-Type: application/xml
Content-Length: 180
Connection: keep-alive
x-amz-request-id: e7b6a2f4aabb0236
x-amz-id-2: adeduW2sBbjtvUneMbgI=
Cache-Control: max-age=0, no-cache, no-store
Strict-Transport-Security: max-age=63072000

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
    <Code>AccessDenied</Code>
    <Message>Unauthenticated requests are not allowed for this api</Message>
</Error>

I'm assuming it's the real server responding since I don't have much else to go off but the presence of the response headers x-amz-request-id and x-amz-id-2

Thank you... I am more than willing to set the keys again. But @DavidTurner can you quickly confirm if you think these are the correct steps (this is what I think I did before, and repeating the same steps might have the same results)

I add the keys, based in the BackBlaze intro to S3_compatible_API it says

Access Key|<your-application-key-id>
Secret Key|<your-application-key>
which I do on my tiebreaker node:

/usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.backblaze.access_key
/usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.backblaze.secret_key
# also the default to the same just in case
/usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key
/usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key

Then sync the nodes

curl --location --request POST -k 'https://10.10.10.65:9200/_nodes/reload_secure_settings'
#response
{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "<id>": {
      "name": "elastic2"
    },
    "<id>": {
      "name": "elastic-tiebreaker"
    },
    "<id>": {
      "name": "elastic1"
    }
  }
}

(not sure if I need to sanitize the id hash but I did)

  1. Do I need to visit each node and do something there, confirm the key is in the keystore by running a list command or show?
  2. Do I need to copy the client config setting to the elasticsearch.yml file
    • if yes, which s3.client settings are needed in that file
    • do I use dot notation or yaml syntax for these settings in the yaml file
  3. Do I need to copy the repository config to the nodes? I assumed by using the API it would move it around for me.

Ah. curl confirms: DNS resolution, TCP connectivity AND server responding, so long as it's from the server that generated the "connection timeout" error.

They look correct indeed. The only thing I can think of that might be wrong is that /usr/share/elasticsearch/bin/elasticsearch-keystore is editing a different keystore from the one that ES is looking at. You need to run elasticsearch-keystore in exactly the same environment as the one in which Elasticsearch itself runs. In particular, if you set ES_PATH_CONF to specify a particular config path when running ES then you need to set the same environment variable for elasticsearch-keystore. You can also confirm that it's editing the right file by looking at the file timestamps.

Thank you, from your prompting! I googled and ended up in the instructions pointing out the config files to find/define the environment variables.

I have not changed any environment variables from the defaults (I'm using the debian package in Ubuntu 24.04)

root@elastic-tiebreaker:~# cat /etc/default/elasticsearch
################################
# Elasticsearch
################################

# Elasticsearch home directory
#ES_HOME=/usr/share/elasticsearch

# Elasticsearch Java path
#ES_JAVA_HOME=

# Elasticsearch configuration directory
# Note: this setting will be shared with command-line tools
ES_PATH_CONF=/etc/elasticsearch

# Elasticsearch PID directory
#PID_DIR=/var/run/elasticsearch

# Additional Java OPTS
#ES_JAVA_OPTS=

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

I just updated the s3.client.backblaze.* keys and checking the timestamp on the keystore

root@elastic-tiebreaker:~# ls -lah /etc/elasticsearch/elasticsearch.keystore
-rw-rw---- 1 root elasticsearch 691 Aug 26 23:37 /etc/elasticsearch/elasticsearch.keystore
root@elastic-tiebreaker:~# /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.backblaze.secret_key
Setting s3.client.backblaze.secret_key already exists. Overwrite? [y/N]y
Enter value for s3.client.backblaze.secret_key:
root@elastic-tiebreaker:~# /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.backblaze.access_key
Setting s3.client.backblaze.access_key already exists. Overwrite? [y/N]y
Enter value for s3.client.backblaze.access_key:
root@elastic-tiebreaker:~# ls -lah /etc/elasticsearch/elasticsearch.keystore
-rw-rw---- 1 root elasticsearch 691 Aug 27 20:05 /etc/elasticsearch/elasticsearch.keystore

I then use the secrets sync which says all is good

curl --location --request POST -k 'https://10.10.10.65:9200/_nodes/reload_secure_settings' --header 'Content-Type: application/json' --header 'Authorization: ApiKey <key_hash>'
{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "<id>": {
      "name": "elastic-tiebreaker"
    },
    "<id>": {
      "name": "elastic2"
    },
    "<id>": {
      "name": "elastic1"
    }
  }
}

But when I check the timestamp of each of the other master node's keystores they are not updating

❯ ssh elastic1.lxd 'ls -l /etc/elasticsearch/elasticsearch.keystore'
-rw-rw---- 1 root elasticsearch 536 Aug 11 03:33 /etc/elasticsearch/elasticsearch.keystore
Connection to 10.10.10.69 closed.
❯ ssh elastic2.lxd 'ls -l /etc/elasticsearch/elasticsearch.keystore'
-rw-rw---- 1 root elasticsearch 439 Aug 15 23:07 /etc/elasticsearch/elasticsearch.keystore
Connection to 10.10.10.68 closed.
### ran the sync on the other node ###
❯ ssh elastic1.lxd 'ls -l /etc/elasticsearch/elasticsearch.keystore'
-rw-rw---- 1 root elasticsearch 536 Aug 11 03:33 /etc/elasticsearch/elasticsearch.keystore
Connection to 10.10.10.69 closed.
❯ ssh elastic2.lxd 'ls -l /etc/elasticsearch/elasticsearch.keystore'
-rw-rw---- 1 root elasticsearch 439 Aug 15 23:07 /etc/elasticsearch/elasticsearch.keystore
Connection to 10.10.10.68 closed.

Those keystores don't actually have the client info stored at all

❯ ssh elastic1.lxd '/usr/share/elasticsearch/bin/elasticsearch-keystore list'
autoconfiguration.password_hash
keystore.seed
xpack.security.http.ssl.keystore.secure_password
xpack.security.transport.ssl.keystore.secure_password
xpack.security.transport.ssl.truststore.secure_password
Connection to 10.10.10.69 closed.
❯ ssh elastic2.lxd '/usr/share/elasticsearch/bin/elasticsearch-keystore list'
keystore.seed
xpack.security.http.ssl.keystore.secure_password
xpack.security.transport.ssl.keystore.secure_password
xpack.security.transport.ssl.truststore.secure_password
Connection to 10.10.10.68 closed.

Since we're this deep into it I will await your guidance @DavidTurner as to the "right" way to sync the keystores. Because I can for sure just ssh to each and manually set them if needed. But it seems like the issue is likely no client auth info in the keystore!

Right yes that'd explain it indeed. Secrets in the keystore are not replicated across nodes, you need to set them up on every node. See these docs for more information:

These [Secure] settings, just like the regular ones in the elasticsearch.yml config file, need to be specified on each node in the cluster. Currently, all secure settings are node-specific settings that must have the same value on every node.

I opened Report more information about keystore contents on reload · Issue #112268 · elastic/elasticsearch · GitHub because I think you would have spotted this issue sooner had we included more information in the reload-secure-settings response.

@DavidTurner you are the best, thank you!

Let me know if I should make any comment on the github issue to help give it a little traction or context... I assume your word is more valuable than mine.

After re-reading the docs you referenced I visited each nodes and added the config for the keystore. I verified the file timestamp and headed back to the tiebacker. I ran the /_snapshot api and was smacked in the face with the same error/fail.

I went back to each node and added the s3.client.default.* keys even though I didn't wanna use it... back to tiebreaker and more fail. :poop:

I re-re-read the docs and since I needed to visit each node maybe I needed to run the /_nodes/reload_secure_settings command to each node, I did that...

It dawned on me, maybe instead of trying to load the snapshot config on the tiebreaker I should send it over to a different node. So instead of

curl -XPUT 'https://localhost:9200/_snapshot/backblaze?pretty' --data '@/etc/elasticsearch/backblaze_repository'

I sent that same request to the 2nd node (not master)

root@elastic-tiebreaker:~# curl -k -u 'elastic:password' -H'Content-Type: application/json' -XPUT 'https://10.10.10.68:9200/_snapshot/backblaze?pretty' --data '@/etc/elasticsearch/backblaze_repository'
{
  "acknowledged" : true
}

:astonished:

What could have been done differently but I didn't try?

  1. once the keystores were updated manually across the cluster maybe I could have used kibana to register the repo but didn't try it.
  2. could have tried registering the repo to the tiebreaker but not through localhost since the elasticsearch.yml is all setup for IP?

Likely solution steps:

  1. ensure the keystore on each node is manually updated with the keys on each node
  2. send the /_snapshot/ command to a master node that isn't a voting-only role ( maybe with ingest role is needed? )

:muscle: :tada: :pray:
I can confirm the first snapshot is on it's way to backblaze b2!

Thank you all for your time and talents.