S3 - setting and verifying repo issue

Trying our in-house S3 compatible storage device as frozen tier, but seeing errors when verifying the repository. Got a confirmation from elastic support that the one we have is fully supported by S3 protocol.

Here are the things done so far:

Elastic v7.13 on RHEL

  1. Made sure no n/w issue between elastic and the in-house S3 endpoint (residing in same DC)

  2. Added the below config to Elasticsearch.yml:

s3.client.es_s3.endpoint: "host:port"
s3.client.es_s3.protocol: https
  1. Added access and secret keys to Elasticsearch keystore:
$bin/elasticsearch-keystore list
s3.client.es_s3.access_key
s3.client.es_s3.secret_key
  1. Enabled the aws s3 plugin

  2. Since it's https URL, added it's cert to default jdk/lib/security/cacerts truststore, and also to both Elasticsearch keystores (transport & http).

  3. openssl connect using the crt file works fine

  4. AWS client connection works fine when using the crt file and also when I add crt to a pem keystore.

eg: /usr/local/bin/aws s3 ls --endpoint-url=https://host:port s3://bucket1/ --ca-bundle $AWS_CA_BUNDLE
  1. Where exactly should the client cert be added to for setting up the repo? It seems it's not able to utilize all the keystores/truststores available in Elastic install/config paths. Tried changing the "xpack.security.http.ssl.keystore.path" to point to pem format file instead of p12 but elastic wouldn't even start when I do so.

Here's the error when I try to run the PUT snapshot command:

# PUT /_snapshot/es_s3
{
  "error" : {
    "root_cause" : [
      {
        "type" : "repository_verification_exception",
        "reason" : "[es_s3] path  is not accessible on master node"
      }
    ],
    "type" : "repository_verification_exception",
    "reason" : "[es_s3] path  is not accessible on master node",
    "caused_by" : {
      "type" : "i_o_exception",
      "reason" : "Unable to upload object [tests-CardaXhOS_KsrWSs-pSKcg/master.dat] using a single upload",
      "caused_by" : {
        "type" : "sdk_client_exception",
        "reason" : "Failed to connect to service endpoint: ",
        "caused_by" : {
          "type" : "socket_timeout_exception",
          "reason" : "Read timed out"
        }
      }
    }
  },
  "status" : 500
}

If you have any suggestions, please let me know. Thanks for your time.

What is your full PUT command for the S3 repository like my example

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "elasticsearch.mydomain.com-snapshots-eu-central-1"
  }
}

or the result from

GET /_snapshot/_all

Here it is:

# with path style added
PUT /_snapshot/elastic_search_s3
{
   "type": "s3",
      "settings": {
          "bucket": "bucket1",
          "endpoint": "host:port",
          "protocol": "https",
          "path_style_access": "true"
     }
}

What happens without these settings ? Just to rule out SSL issues.

I think your secrets should be

s3.client.default.access_key
s3.client.default.secret_key

Unless you set the client name to es_s3 in the repository settings? Did you?

client

The name of the S3 client to use to connect to S3. Defaults to default.

It is not the name of the snapshot

I have the client name set to es_s3 (same as repo name) and so the secrets are named as such: s3.client.es_s3.access_key & s3.client.es_s3.secret_key.

Good point on trying with http protocol, I tested against it this morning and still notice the same issue.
Changed the Elasticsearch.yml settings to:

s3.client.es_s3.endpoint: "host:httpport"
s3.client.es_s3.protocol: http

Plugin has been enabled:

bin/elasticsearch-plugin list:
repository-s3

Dev tools command:

DELETE _snapshot/es_s3

PUT /_snapshot/es_s3
{
   "type": "s3",
      "settings": {
          "bucket": "bucket1",
          "endpoint": "host:httpport",
          "protocol": "http"
     }
}

Error msg:

[2021-10-14T09:41:52,945][WARN ][r.suppressed             ] [ingest1] path: /_snapshot/es_s3, params: {pretty=true, repository=es_s3}
org.elasticsearch.repositories.RepositoryVerificationException: [es_s3] path  is not accessible on master node
Caused by: java.io.IOException: Unable to upload object [tests-WarEYldYSLyB-kW-08AebQ/master.dat] using a single upload
.....
Caused by: com.amazonaws.SdkClientException: Failed to connect to service endpoint:
.....
Caused by: java.net.SocketTimeoutException: Read timed out

Not sure what else might be missing!!

I tried NOT to give a client name and instead used the default (with minimal expected configuration). Added the default access& secret keys to elastic keystore - however see this error when attempted to do the same:

Elasticsearch.yml:

s3.client.default.endpoint: "host:httpport"
s3.client.default.protocol: http

verified keystore list:

bin/elasticsearch-keystore list
s3.client.default.access_key
s3.client.default.secret_key

Devtools commands:

DELETE _snapshot/es_s3

PUT /_snapshot/es_s3
{
   "type": "s3",
      "settings": {
          "bucket": "bucket1"
     }
}

Error msg:

org.elasticsearch.repositories.RepositoryVerificationException: [es_s3] path  is not accessible on master node
Caused by: java.io.IOException: Unable to upload object [tests-bsxr8NkARKewFF0NPaoFwg/master.dat] using a single upload
...
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: bucket1.host
...
Caused by: java.net.UnknownHostException: bucket1.host

verified telnet from elastic to host on httpport works fine - so connectivity issue:

@vee

What happens when you just use the basic configuration as documented without the additional parameters, is this possible ?

How Many Nodes? What Version?

You can turn up the logs to DEBUG and see if you get more verbose error logs.

Not sure if you redacted those values but the log message appears to be indicating that Elasticsearch can not resolve the host.

Trying on single node for now - with v7.13.

Did turn on the trace, and that's all I got.

PUT _cluster/settings
{
  "transient": {
    "logger.org.elasticsearch.snapshots" : "TRACE",
    "logger.org.elasticsearch.repositories.s3": "TRACE",
    "logger.com.amazonaws" : "DEBUG"
  }
}

I did try to use the most basic config out there this time -

  1. Removed the S3 entries from Elasticsearch.yml
  2. Updated keystore with default client access key & secret key as shared earlier.
  3. Removed and re-installed the s3 plugin
  4. Restarted Elasticsearch
  5. Deleted existing snapshots
  6. Created new one using the below
PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "bucket1",
    "endpoint": "host",
    "protocol": "http"
  }
}

Error msg in logs:

Could not determine repository generation from root blobs
Caused by: java.io.IOException: Exception when listing blobs by prefix [index-]
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: bucket1.host
Caused by: java.net.UnknownHostException: bucket1.host

Not sure why would it's using bucket name infront of the host as that does not point to a valid host.

Actually, I got more logs from debug:


[2021-10-14T11:13:35,524][DEBUG][c.a.h.IdleConnectionReaper] [ingest1] Reaper thread:
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method) ~[?:?]
        at com.amazonaws.http.IdleConnectionReaper.run(IdleConnectionReaper.java:188) [aws-java-sdk-core-1.11.749.jar:?]
[2021-10-14T11:13:35,525][DEBUG][c.a.h.IdleConnectionReaper] [ingest1] Shutting down reaper thread.
[2021-10-14T11:13:35,538][INFO ][o.e.r.RepositoriesService] [ingest1] deleted repositories [[my_s3_repository]]
[2021-10-14T11:13:36,798][DEBUG][o.e.r.s.S3Repository     ] [ingest1] using bucket [bucket1], chunk_size [5tb], server_side_encryption [false], buffer_size [100mb], cannedACL [], storageClass []
[2021-10-14T11:13:36,875][DEBUG][o.e.r.s.S3Repository     ] [ingest1] using bucket [bucket1], chunk_size [5tb], server_side_encryption [false], buffer_size [100mb], cannedACL [], storageClass []
[2021-10-14T11:13:36,887][INFO ][o.e.r.RepositoriesService] [ingest1] put repository [my_s3_repository]
[2021-10-14T11:13:36,887][DEBUG][o.e.r.s.S3Service        ] [ingest1] Using basic key/secret credentials
[2021-10-14T11:13:36,888][DEBUG][o.e.r.s.S3Service        ] [ingest1] using endpoint [http://host:port] and region [null]
[2021-10-14T11:13:36,889][DEBUG][c.a.m.CsmConfigurationProviderChain] [ingest1] Unable to load configuration from com.amazonaws.monitoring.EnvironmentVariableCsmConfigurationProvider@46af768b: Unable to load Client Side Monitoring configurations from environment variables!
[2021-10-14T11:13:36,889][DEBUG][c.a.m.CsmConfigurationProviderChain] [ingest1] Unable to load configuration from com.amazonaws.monitoring.SystemPropertyCsmConfigurationProvider@fa9e1c7: Unable to load Client Side Monitoring configurations from system properties variables!
[2021-10-14T11:13:36,890][DEBUG][c.a.m.CsmConfigurationProviderChain] [ingest1] Unable to load configuration from com.amazonaws.monitoring.ProfileCsmConfigurationProvider@7a744664: Unable to load config file
[2021-10-14T11:13:36,891][DEBUG][c.a.request              ] [ingest1] Sending Request: PUT http://bucket1.host:port /tests-qZtxTruaT1SgWJcDwKBn1Q/master.dat Headers: (amz-sdk-invocation-id: bf97d615-f08c-c736-dc65-bdf80e6afca1, Content-Length: 22, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.11.749 Linux/4.18.0-305.10.2.el8_4.x86_64 OpenJDK_64-Bit_Server_VM/16+36 java/16 vendor/AdoptOpenJDK, x-amz-acl: private, x-amz-storage-class: STANDARD, )
[2021-10-14T11:13:36,891][DEBUG][c.a.a.AWS4Signer         ] [ingest1] AWS4 Canonical Request: '"PUT
/tests-qZtxTruaT1SgWJcDwKBn1Q/master.dat

amz-sdk-invocation-id:bf97d615-f08c-c736-dc65-bdf80e6afca1
amz-sdk-retry:0/0/500
content-length:195
content-type:application/octet-stream
host:bucket1.host:port
user-agent:aws-sdk-java/1.11.749 Linux/4.18.0-305.10.2.el8_4.x86_64 OpenJDK_64-Bit_Server_VM/16+36 java/16 vendor/AdoptOpenJDK
x-amz-acl:private
x-amz-content-sha256:STREAMING-AWS4-HMAC-SHA256-PAYLOAD
x-amz-date:20211014T161336Z
x-amz-decoded-content-length:22
x-amz-storage-class:STANDARD

amz-sdk-invocation-id;amz-sdk-retry;content-length;content-type;host;user-agent;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-storage-class
STREAMING-AWS4-HMAC-SHA256-PAYLOAD"
[2021-10-14T11:13:36,891][DEBUG][c.a.a.AWS4Signer         ] [ingest1] AWS4 String to Sign: '"AWS4-HMAC-SHA256
20211014T161336Z
20211014/us-east-1/s3/aws4_request
7262d65a576f418bb902e28071c30b397e46e2f6528cb911f0b32540357ccccd"
[2021-10-14T11:13:36,894][DEBUG][c.a.h.c.ClientConnectionManagerFactory] [ingest1]
java.lang.reflect.InvocationTargetException: null
        at jdk.internal.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:567) ~[?:?]
        at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.conn.$Proxy30.connect(Unknown Source) [?:?]
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) [httpclient-4.5.10.jar:4.5.10]
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) [httpclient-4.5.10.jar:4.5.10]
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) [httpclient-4.5.10.jar:4.5.10]
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) [httpclient-4.5.10.jar:4.5.10]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) [httpclient-4.5.10.jar:4.5.10]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) [httpclient-4.5.10.jar:4.5.10]
        at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1323) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524) [aws-java-sdk-core-1.11.749.jar:?]
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5054) [aws-java-sdk-s3-1.11.749.jar:?]
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5000) [aws-java-sdk-s3-1.11.749.jar:?]
        at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:394) [aws-java-sdk-s3-1.11.749.jar:?]
        at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:5942) [aws-java-sdk-s3-1.11.749.jar:?]
        at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1808) [aws-java-sdk-s3-1.11.749.jar:?]
        at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1768) [aws-java-sdk-s3-1.11.749.jar:?]
        at ...

Seems there is a limitation on the number of chars I can share here, any other site you can recommend so I can share full content for review?

You can use paste bin, but @vee I would suggest Elastic Support to debug this. If you have support use it.

Also did you try to access your s3 directly with http? Is it locked Down to https?

Also you asked where the cert went

protocol

The protocol to use to connect to S3. Valid values are either http or https . Defaults to https . When using HTTPS, this plugin validates the repository’s certificate chain using the JVM-wide truststore. Ensure that the root certificate authority is in this truststore using the JVM’s keytool tool.

What about even more basic for example?

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "elasticsearch.mydomain.com-snapshots-eu-central-1"
  }
}

If you see from my setup, it connects via https:

[2021-10-14T18:28:29,658][DEBUG][c.a.request              ] [IsaacAsimov] Sending Request: GET https://elasticsearch.mydomain.com-snapshots-eu-central-1.s3.eu-central-1.amazonaws.com /index-5 Headers: (amz-sdk-invocation-id: 476dd792-df59-31fd-c147-2f5d83c0d0a2, Content-Type: application/octet-stream, User-Agent: aws-sdk-java/1.11.749 Linux/4.19.0-18-amd64 OpenJDK_64-Bit_Server_VM/16.0.2+7 java/16.0.2 vendor/Eclipse_Foundation, ) 
[2021-10-14T18:28:29,659][DEBUG][c.a.a.AWS4Signer         ] [IsaacAsimov] AWS4 Canonical Request: '"GET
/index-5

Thanks guys, I did engage elastic support couple weeks ago but because it's not a Sev 1 the response time is usually much slower than what's needed esp. when am trying things. This forum is surely much quicker to respond and so thought of trying it out here.

@zx8086: FYI - this is an in-house S3 compatible storage (not AWS cloud) and so am using what our system admins provided me to connect to it. It works when tried using aws cli from the same elastic node.

What is the aws cli code you are using ?

I would imagine without http setting you can see a true error if it rejects the plain traffic.

@vee

Interesting......

I am testing 7.13.2 Single node on my localhost

I added the plugin

./bin/elasticsearch-plugin install repository-s3

I just did this with the literal values to see

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my-bucket",
    "endpoint" : "s3host",
    "protocol" : "http"
  }
}

In my logs I see this!


[2021-10-14T09:52:28,131][DEBUG][c.a.h.IdleConnectionReaper] [ceres] Reaper thread: 
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method) ~[?:?]
        at com.amazonaws.http.IdleConnectionReaper.run(IdleConnectionReaper.java:188) [aws-java-sdk-core-1.11.749.jar:?]
[2021-10-14T09:52:28,132][DEBUG][c.a.h.IdleConnectionReaper] [ceres] Shutting down reaper thread.
[2021-10-14T09:52:28,145][INFO ][o.e.r.RepositoriesService] [ceres] updated repository [my_s3_repository]
[2021-10-14T09:52:28,146][DEBUG][o.e.r.s.S3Service        ] [ceres] Using instance profile credentials
[2021-10-14T09:52:28,146][DEBUG][o.e.r.s.S3Service        ] [ceres] using endpoint [http://s3host] and region [null] <!----- NOTE Correct Endpoint 

[ceres] using endpoint [http://s3host] and region [null] <!----- NOTE Correct Endpoint not with the bucket prefix

Perhaps you should uninstall and re-install the s3 plugin...

Or try version 7.13.2 perhaps there was a bug

Interesting, I am infact using v7.13.2 on my end (my bad, I just mentioned the minor version earlier). Interesting that when you tried to connect - you don't see the bucket names prefix. Did you had to enable any additional permissions listed out at the time of plugin installation?

bin]$ ./elasticsearch-plugin install repository-s3
-> Installing repository-s3
-> Downloading repository-s3 from elastic
[=================================================] 100%  
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.net.SocketPermission * connect,resolve
* java.util.PropertyPermission es.allow_insecure_settings read,write
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed repository-s3
-> Please restart Elasticsearch to activate any plugins installed

Nope my plugin install looks exactly the same as yours....

except I do notice something really minor, I run mine from the elastic root directory

./bin/elasticsearch-plugin install repository-s3

sometime the path to that root directory is used to set some config etc. (probably not... but we are looking for anything at this point)

What JDK? The Default?

@vee It worked for without and then later on adding those permissions to the java.policy file

@Stephen - yes, default JDK: install-dir/jdk. That's what both JAVA_HOME and ES_JAVA_HOME are pointing to. Tried to remove and re-install plugin from elastic root dir like you - still same behavior.

@zx8086 - didn't quite follow you - did you mean you had to add the java.policy file with those listed permissions while installing plugin? If so, can you share that java.policy file and it's path?