Cannot Register Snapshot Repository for S3 compliant (backblaze) service

ivanlawrence · August 27, 2024, 12:00am

I've looked at the other tickets for similar issues and have not yet found a solution or even suggestion for a next step.

On ElasticCloud 7.15 I have a repo setup to backblaze and it's working well enough (a little slow but )...
On Elasticsearch on-prem 8.15 when I attempt to configure the repo it fails immediately with the following

{
  "error": {
    "root_cause": [
      {
        "type": "repository_verification_exception",
        "reason": "[backblaze] path  is not accessible on master node"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[backblaze] path  is not accessible on master node",
    "caused_by": {
      "type": "i_o_exception",
      "reason": "Unable to upload object [tests-poJcFn-YSBCXBlSMlWqpxw/master.dat] using a single upload",
      "caused_by": {
        "type": "sdk_client_exception",
        "reason": "sdk_client_exception: Failed to connect to service endpoint: ",
        "caused_by": {
          "type": "i_o_exception",
          "reason": "Connect timed out"
        }
      }
    }
  },
  "status": 500
}

The logs say similar StackTrace

[2024-08-26T23:20:02,161][WARN ][r.suppressed             ] [elastic-tiebreaker] path: /_snapshot/backblaze, params: {repository=backblaze}, status: 500
org.elasticsearch.transport.RemoteTransportException: [elastic1][10.10.10.69:9300][cluster:admin/repository/put]
Caused by: org.elasticsearch.repositories.RepositoryVerificationException: [backblaze] path  is not accessible on master node
Caused by: java.io.IOException: Unable to upload object [tests-ZX2GBMIhQFuPJU79lCqUlw/master.dat] using a single upload
        at org.elasticsearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:460) ~[?:?]
        at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$writeBlob$1(S3BlobContainer.java:138) ~[?:?]
        at java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
        at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedIOException(SocketAccess.java:37) ~[?:?]
        at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlob(S3BlobContainer.java:136) ~[?:?]
        at org.elasticsearch.common.blobstore.BlobContainer.writeBlob(BlobContainer.java:123) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.repositories.s3.S3BlobContainer.writeBlobAtomic(S3BlobContainer.java:298) ~[?:?]
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.startVerification(BlobStoreRepository.java:2154) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.repositories.RepositoriesService.lambda$validatePutRepositoryRequest$11(RepositoriesService.java:361) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.action.ActionRunnable$1.doRun(ActionRunnable.java:36) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[elasticsearch-8.15.0.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.15.0.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1570) ~[?:?]
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: sdk_client_exception: Failed to connect to service endpoint:
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100) ~[?:?]
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:70) ~[?:?]
        at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.readResource(InstanceMetadataServiceResourceFetcher.java:75) ~[?:?]
        at com.amazonaws.internal.EC2ResourceFetcher.readResource(EC2ResourceFetcher.java:66) ~[?:?]
        at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsEndpoint(InstanceMetadataServiceCredentialsFetcher.java:60) ~[?:?]
...skipping...

I have followed as many different tutorials I can find. The common steps are

add the appkey and secret to the ES keystore using CLI and use the /_nodes/reload_secure_settings api to sync nodes
use the elasticsearch.yml config to add the client settings (I couldn't find a tutorial that made this totally clear as to what I needed to add
use the snapshot API or kibana to register the repo (both fail immediately)

The JSON I'm using to register in the API (i've used the most simple version with min settings, this is the latest test):

{
  "type": "s3",
  "settings": {
    "bucket": "elasticsearch-onprem",
    "endpoint": "s3.us-west-004.backblazeb2.com",
    "region": "us-west-004",
    "compress": "true",
    "server_side_encryption": "true",
    "client": "default",
    "path_style_access" : "true",
    "protocol": "https",
    "max_retries": "10",
    "read_timeout": "2m"
  }
}

relevant elasticsearch.yml config

s3.client.backblaze.endpoint: "s3.us-west-004.backblazeb2.com"
s3.client.backblaze.path_style_access: true

I've tried using s3.client.default too and there is no difference, all fail.

I reached out to backblaze who suggested increasing the timeout, but when I try to configure the repo it immediately fails and doesn't respect the timeouts/retries

ivanlawrence · August 27, 2024, 12:28am

My hunch is it has something to do with authentication. BackBlaze says

Authentication

The S3-Compatible API supports only v4 signatures for authentication, and v2 signatures are not supported at this time. To learn more about S3 authentication, see this article.

Pointing to "AWS Signature Version 4 for API requests"

How can I ensure the JAVA SDK is using v4 signatures?

DavidTurner · August 27, 2024, 6:55am

I think you need "client": "backblaze" here. At least, assuming you set s3.client.backblaze.access_key and s3.client.backblaze.secret_key in your keystore.

Also please read this section of the docs carefully re. non-AWS implementations of the S3 API. I know nothing in particular about Backblaze's offering, but there are lots of other folks that claim to offer a S3-compatible API despite demonstrable incomatibilities. Tread carefully.

ivanlawrence · August 27, 2024, 7:37am

Thank you @DavidTurner , I have configured both s3.client.backblaze.* and s3.client.default.* clients while on this path of discovery and failure.

I had created only the backblaze client first (no default) on my 8.15 install and then after this post wondered if my ElasticCloud 7.15 version worked with backblaze client because it was already configured with a default client prior to adding backblaze... just trying a little of anything.

I have read that section (non-aws services)... The weird part is it works in v7.15 so it felt like I could trust backblaze to be enough compliant.

Another super interesting note is a google search for "elasticsearch backblaze plugin" brings me to a github issue about it from 2021:

github.com/elastic/elasticsearch

Support Backblaze B2 for Snapshot/Restore

opened 07:25PM - 02 Nov 16 UTC

closed 12:52PM - 26 Mar 18 UTC

OverlordQ

>enhancement stalled :Distributed/Snapshot/Restore

**Describe the feature**: B2 is a competitor to S3, Google, Azure, RS, etc cloud storage offerings. It would be great to offer this in addition to backing up to s3 buckets

which points to another related issue:

github.com/mediacloud/story-indexer

migrate more backups to BackBlaze to reduce costs

opened 04:28PM - 24 May 24 UTC

rahulbot

enhancement infrastructure

Following up on #270, we want to continue migrating backups from S3 to B2. This …should include: - [x] rss-fetcher postgres backups (old files migrated, new files written to B2) - [x] start writing production WARC files to B2 - [x] start writing 2022 CSV backfill WARC files to B2 - [x] start writing 2022 RSS backfill WARC files to B2 - [x] stop writing production WARC files to S3 - [ ] stop writing 2022 backfill WARC files to S3 - [ ] transfer old story-indexer archive (WARC) files, some files at ramos:/srv/data/docker/indexer/worker_data/archiver/ - [x] create public mediacloud-public bucket, requires verified email address - [x] transfer rss-fetcher synthetic RSS files: files in tarbell:/space/dokku/data/storage/rss-fetcher-storage/rss-output-files/ - [x] transfer historic synthetic RSS files: files in tarbell:/space/S3/mediacloud-public/daily-rss/ - [x] web-app postgres backups (old files migrated, new files written to B2) - [ ] ES snapshots - [ ] other mish-mash of historical files on S3? 2024-06-26: All production stacks (daily, 2022 csv and 2022 rss) are writing to both S3 and B2

which points to an issue from just 3weeks ago:

I get these are elastic internal things but seems like if they can do it I should be able to do it. Just like them I wanna migrate more backups to BackBlaze to reduce our costs!

does pgulley or rahulbot ever come hang out on these forums, maybe they might have an insight? I just commented on one of those github issues in the hope there can be more support for backblaze. I hope I didn't overstep any lines.

DavidTurner · August 27, 2024, 8:21am

Are you sure you set these settings in the keystore and not just in elasticsearch.yml? The exception you've shared indicates that ES has not seen any access_key or secret_key settings, and these settings live in the keystore.

Phil_Budne · August 27, 2024, 3:53pm

@ivanlawrence I'm on the mediacloud team, and did the initial testing with B2. We're using (from Python), repo_type = "s3" and endpoint = "https://s3.BACKBLAZE_REGION.backblazeb2.com" but NOT supplying region (which could trigger some client code to generate an AWS endpoint URL) on the create_repository call: S3 repository | Elasticsearch Guide [8.15] | Elastic

DavidTurner · August 27, 2024, 4:18pm

Ah yeah that might help indeed.

ivanlawrence · August 27, 2024, 4:34pm

Thank you @Phil_Budne for responding!

I have just tried both an empty string and removing region from the config, both both have the same failed 500 result:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "repository_verification_exception",
        "reason" : "[backblaze] path  is not accessible on master node"
      }
    ],
    "type" : "repository_verification_exception",
    "reason" : "[backblaze] path  is not accessible on master node",
    "caused_by" : {
      "type" : "i_o_exception",
      "reason" : "Unable to upload object [tests-uwsEWkq3SIy1FdB5mnXcMA/master.dat] using a single upload",
      "caused_by" : {
        "type" : "sdk_client_exception",
        "reason" : "sdk_client_exception: Failed to connect to service endpoint: ",
        "caused_by" : {
          "type" : "i_o_exception",
          "reason" : "Connect timed out"
        }
      }
    }
  },
  "status" : 500
}

It feels like it shouldn't be this hard and that maybe I have missed something else in the setup process... but what exactly is the mystery.

ivanlawrence · August 27, 2024, 4:41pm

@DavidTurner , thank you for replying.

I just tried twice "region": "" and removing region from the config entirely and there is no change in the response or error messages in /var/log/elasticsearch/elasticsearch.log

Feels like something outside of this config might be an issue? I can confirm all nodes can resolve DNS to the endpoint. I'm going to try and setup rclone on a system and see if I can connect outside of Elasticsearch.

I'm even considering trying the JAVA S3 SDK outside of elasticsearch... but all this feels like something simple must be missing... any guesses as to where I can look next? I guess I could create a throwaway S3 bucket as a sanity check, assuming it won't cost an arm and a leg

Phil_Budne · August 27, 2024, 4:55pm

looks like a connectivity problem.
What happens if you run curl on the B2 EP URL from the ES server?

DavidTurner · August 27, 2024, 4:57pm

This bit of stack trace...

ivanlawrence:

Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: sdk_client_exception: Failed to connect to service endpoint:
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100) ~[?:?]
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:70) ~[?:?]
        at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.readResource(InstanceMetadataServiceResourceFetcher.java:75) ~[?:?]
        at com.amazonaws.internal.EC2ResourceFetcher.readResource(EC2ResourceFetcher.java:66) ~[?:?]
        at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsEndpoint(InstanceMetadataServiceCredentialsFetcher.java:60) ~[?:?]

... says that it's failing while it's trying to use the instance metadata service to get credentials, which means it's not seeing any other credentials. That's why I think you are not setting the access_key and secret_key correctly in the keystore.

ivanlawrence · August 27, 2024, 5:10pm

This is how I "confirmed DNS resolution"

root@elastic-tiebreaker:~# curl --location 'https://s3.us-west-004.backblazeb2.com/' -i
HTTP/1.1 403
Server: nginx
Date: Tue, 27 Aug 2024 17:08:38 GMT
Content-Type: application/xml
Content-Length: 180
Connection: keep-alive
x-amz-request-id: e7b6a2f4aabb0236
x-amz-id-2: adeduW2sBbjtvUneMbgI=
Cache-Control: max-age=0, no-cache, no-store
Strict-Transport-Security: max-age=63072000

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
    <Code>AccessDenied</Code>
    <Message>Unauthenticated requests are not allowed for this api</Message>
</Error>

I'm assuming it's the real server responding since I don't have much else to go off but the presence of the response headers x-amz-request-id and x-amz-id-2

ivanlawrence · August 27, 2024, 5:18pm

Thank you... I am more than willing to set the keys again. But @DavidTurner can you quickly confirm if you think these are the correct steps (this is what I think I did before, and repeating the same steps might have the same results)

I add the keys, based in the BackBlaze intro to S3_compatible_API it says

Access Key|<your-application-key-id>
Secret Key|<your-application-key>
which I do on my tiebreaker node:

/usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.backblaze.access_key
/usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.backblaze.secret_key
# also the default to the same just in case
/usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key
/usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key

Then sync the nodes

curl --location --request POST -k 'https://10.10.10.65:9200/_nodes/reload_secure_settings'
#response
{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "<id>": {
      "name": "elastic2"
    },
    "<id>": {
      "name": "elastic-tiebreaker"
    },
    "<id>": {
      "name": "elastic1"
    }
  }
}

(not sure if I need to sanitize the id hash but I did)

Do I need to visit each node and do something there, confirm the key is in the keystore by running a list command or show?
Do I need to copy the client config setting to the elasticsearch.yml file
- if yes, which s3.client settings are needed in that file
- do I use dot notation or yaml syntax for these settings in the yaml file
Do I need to copy the repository config to the nodes? I assumed by using the API it would move it around for me.

Phil_Budne · August 27, 2024, 5:41pm

Ah. curl confirms: DNS resolution, TCP connectivity AND server responding, so long as it's from the server that generated the "connection timeout" error.

DavidTurner · August 27, 2024, 6:03pm

They look correct indeed. The only thing I can think of that might be wrong is that /usr/share/elasticsearch/bin/elasticsearch-keystore is editing a different keystore from the one that ES is looking at. You need to run elasticsearch-keystore in exactly the same environment as the one in which Elasticsearch itself runs. In particular, if you set ES_PATH_CONF to specify a particular config path when running ES then you need to set the same environment variable for elasticsearch-keystore. You can also confirm that it's editing the right file by looking at the file timestamps.

ivanlawrence · August 27, 2024, 8:26pm

Thank you, from your prompting! I googled and ended up in the instructions pointing out the config files to find/define the environment variables.

I have not changed any environment variables from the defaults (I'm using the debian package in Ubuntu 24.04)

root@elastic-tiebreaker:~# cat /etc/default/elasticsearch
################################
# Elasticsearch
################################

# Elasticsearch home directory
#ES_HOME=/usr/share/elasticsearch

# Elasticsearch Java path
#ES_JAVA_HOME=

# Elasticsearch configuration directory
# Note: this setting will be shared with command-line tools
ES_PATH_CONF=/etc/elasticsearch

# Elasticsearch PID directory
#PID_DIR=/var/run/elasticsearch

# Additional Java OPTS
#ES_JAVA_OPTS=

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

I just updated the s3.client.backblaze.* keys and checking the timestamp on the keystore

root@elastic-tiebreaker:~# ls -lah /etc/elasticsearch/elasticsearch.keystore
-rw-rw---- 1 root elasticsearch 691 Aug 26 23:37 /etc/elasticsearch/elasticsearch.keystore
root@elastic-tiebreaker:~# /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.backblaze.secret_key
Setting s3.client.backblaze.secret_key already exists. Overwrite? [y/N]y
Enter value for s3.client.backblaze.secret_key:
root@elastic-tiebreaker:~# /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.backblaze.access_key
Setting s3.client.backblaze.access_key already exists. Overwrite? [y/N]y
Enter value for s3.client.backblaze.access_key:
root@elastic-tiebreaker:~# ls -lah /etc/elasticsearch/elasticsearch.keystore
-rw-rw---- 1 root elasticsearch 691 Aug 27 20:05 /etc/elasticsearch/elasticsearch.keystore

I then use the secrets sync which says all is good

curl --location --request POST -k 'https://10.10.10.65:9200/_nodes/reload_secure_settings' --header 'Content-Type: application/json' --header 'Authorization: ApiKey <key_hash>'
{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "<id>": {
      "name": "elastic-tiebreaker"
    },
    "<id>": {
      "name": "elastic2"
    },
    "<id>": {
      "name": "elastic1"
    }
  }
}

But when I check the timestamp of each of the other master node's keystores they are not updating

❯ ssh elastic1.lxd 'ls -l /etc/elasticsearch/elasticsearch.keystore'
-rw-rw---- 1 root elasticsearch 536 Aug 11 03:33 /etc/elasticsearch/elasticsearch.keystore
Connection to 10.10.10.69 closed.
❯ ssh elastic2.lxd 'ls -l /etc/elasticsearch/elasticsearch.keystore'
-rw-rw---- 1 root elasticsearch 439 Aug 15 23:07 /etc/elasticsearch/elasticsearch.keystore
Connection to 10.10.10.68 closed.
### ran the sync on the other node ###
❯ ssh elastic1.lxd 'ls -l /etc/elasticsearch/elasticsearch.keystore'
-rw-rw---- 1 root elasticsearch 536 Aug 11 03:33 /etc/elasticsearch/elasticsearch.keystore
Connection to 10.10.10.69 closed.
❯ ssh elastic2.lxd 'ls -l /etc/elasticsearch/elasticsearch.keystore'
-rw-rw---- 1 root elasticsearch 439 Aug 15 23:07 /etc/elasticsearch/elasticsearch.keystore
Connection to 10.10.10.68 closed.

Those keystores don't actually have the client info stored at all

❯ ssh elastic1.lxd '/usr/share/elasticsearch/bin/elasticsearch-keystore list'
autoconfiguration.password_hash
keystore.seed
xpack.security.http.ssl.keystore.secure_password
xpack.security.transport.ssl.keystore.secure_password
xpack.security.transport.ssl.truststore.secure_password
Connection to 10.10.10.69 closed.
❯ ssh elastic2.lxd '/usr/share/elasticsearch/bin/elasticsearch-keystore list'
keystore.seed
xpack.security.http.ssl.keystore.secure_password
xpack.security.transport.ssl.keystore.secure_password
xpack.security.transport.ssl.truststore.secure_password
Connection to 10.10.10.68 closed.

Since we're this deep into it I will await your guidance @DavidTurner as to the "right" way to sync the keystores. Because I can for sure just ssh to each and manually set them if needed. But it seems like the issue is likely no client auth info in the keystore!

DavidTurner · August 28, 2024, 7:07am

Right yes that'd explain it indeed. Secrets in the keystore are not replicated across nodes, you need to set them up on every node. See these docs for more information:

These [Secure] settings, just like the regular ones in the elasticsearch.yml config file, need to be specified on each node in the cluster. Currently, all secure settings are node-specific settings that must have the same value on every node.

DavidTurner · August 28, 2024, 7:13am

I opened Report more information about keystore contents on reload · Issue #112268 · elastic/elasticsearch · GitHub because I think you would have spotted this issue sooner had we included more information in the reload-secure-settings response.

ivanlawrence · August 28, 2024, 8:21pm

@DavidTurner you are the best, thank you!

Let me know if I should make any comment on the github issue to help give it a little traction or context... I assume your word is more valuable than mine.

After re-reading the docs you referenced I visited each nodes and added the config for the keystore. I verified the file timestamp and headed back to the tiebacker. I ran the /_snapshot api and was smacked in the face with the same error/fail.

I went back to each node and added the s3.client.default.* keys even though I didn't wanna use it... back to tiebreaker and more fail.

I re-re-read the docs and since I needed to visit each node maybe I needed to run the /_nodes/reload_secure_settings command to each node, I did that...

It dawned on me, maybe instead of trying to load the snapshot config on the tiebreaker I should send it over to a different node. So instead of

curl -XPUT 'https://localhost:9200/_snapshot/backblaze?pretty' --data '@/etc/elasticsearch/backblaze_repository'

I sent that same request to the 2nd node (not master)

root@elastic-tiebreaker:~# curl -k -u 'elastic:password' -H'Content-Type: application/json' -XPUT 'https://10.10.10.68:9200/_snapshot/backblaze?pretty' --data '@/etc/elasticsearch/backblaze_repository'
{
  "acknowledged" : true
}

What could have been done differently but I didn't try?

once the keystores were updated manually across the cluster maybe I could have used kibana to register the repo but didn't try it.
could have tried registering the repo to the tiebreaker but not through localhost since the elasticsearch.yml is all setup for IP?

Likely solution steps:

ensure the keystore on each node is manually updated with the keys on each node
send the /_snapshot/ command to a master node that isn't a voting-only role ( maybe with ingest role is needed? )

ivanlawrence · August 28, 2024, 8:34pm

I can confirm the first snapshot is on it's way to backblaze b2!

Thank you all for your time and talents.

Topic		Replies	Views
Configure a custom s3 repository Elasticsearch snapshot-and-restore	5	27	November 11, 2024
ES 6.4.0 repository S3 issue Elasticsearch snapshot-and-restore	3	409	October 19, 2021
8.2 S3 SETUP Elasticsearch snapshot-and-restore	7	343	October 18, 2022
AWS S3 repository for snapshot/restore in elasticsearch Elasticsearch snapshot-and-restore	8	778	October 31, 2023
Unable to register s3 repo Elasticsearch snapshot-and-restore	7	720	August 5, 2021

Cannot Register Snapshot Repository for S3 compliant (backblaze) service

Authentication

Related topics