Snapshot restore from S3 bucket in different region not working

subbu_sw · September 17, 2019, 10:40am

We have aws instances that are setup in different regions (eu-central-1, ap-south-1 etc). Each aws instance has its own ES single-node cluster running within the instance. There are s3 buckets created for each region that has the snapshots getting written daily from the instance. We use curator for managing the ES snapshot and restore process.

Until recently, for analytical purposes, we could restore indices from the snapshot of different region instance onto a spot instance that was running in another region. But this stopped working recently. If the spot instance were setup in the same region as the s3 bucket, then the restore completes. For eg: the snapshot would be in an s3 bucket that was setup in eu-central-1, and we are trying to restore it in an ES instance in ap-south-1. I have tried giving the endpoint setting and without it in the repo settings. Both do not work now.

Attached is the trace enabled log file when the restore does not work. Any help with this is appreciated. Do let me know if additional logs or settings are needed.

We are using elasticsearch version 6.5.1

[2019-09-17T05:43:23,891][DEBUG][o.a.h.i.c.PoolingHttpClientConnectionManager] [tkHi76j] Connection released: [id: 8][route: {s}->https://sl-de-es5-biz.s3.eu-central-1.amazonaws.com:443][total kept alive: 1; route allocated: 1 of 50; total allocated: 1 of 50]
[2019-09-17T05:43:23,896][DEBUG][o.e.c.s.MasterService    ] [tkHi76j] processing [restore_snapshot[curator-20190131013004]]: execute
[2019-09-17T05:43:23,904][DEBUG][o.e.c.r.a.a.BalancedShardsAllocator] [tkHi76j] skipping rebalance due to in-flight shard/store fetches
[2019-09-17T05:43:23,905][DEBUG][o.e.c.s.MasterService    ] [tkHi76j] cluster state updated, version [8], source [restore_snapshot[curator-20190131013004]]
[2019-09-17T05:43:23,905][DEBUG][o.e.c.s.MasterService    ] [tkHi76j] publishing cluster state version [8]
[2019-09-17T05:43:23,905][DEBUG][o.e.c.s.ClusterApplierService] [tkHi76j] processing [apply cluster state (from master [master {tkHi76j}{tkHi76jLQ7C-M8qyCHNRcg}{qjO2MKEJRLmLtRYFzWEb_Q}{10.0.0.204}{10.0.0.204:9300}{ml.machine_memory=32151224320, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [8] source [restore_snapshot[curator-20190131013004]]])]: execute
[2019-09-17T05:43:23,905][DEBUG][o.e.c.s.ClusterApplierService] [tkHi76j] cluster state updated, version [8], source [apply cluster state (from master [master {tkHi76j}{tkHi76jLQ7C-M8qyCHNRcg}{qjO2MKEJRLmLtRYFzWEb_Q}{10.0.0.204}{10.0.0.204:9300}{ml.machine_memory=32151224320, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [8] source [restore_snapshot[curator-20190131013004]]])]
[2019-09-17T05:43:23,905][DEBUG][o.e.c.s.ClusterApplierService] [tkHi76j] applying cluster state version 8
[2019-09-17T05:43:23,905][DEBUG][o.e.c.s.ClusterApplierService] [tkHi76j] apply cluster state with version 8
[2019-09-17T05:43:23,929][DEBUG][o.e.c.s.ClusterApplierService] [tkHi76j] set locally applied cluster state to version 8

ywelsch · September 20, 2019, 9:15am

Can you provide the full configuration that you've used? In particular, how did you configure the endpoint?

subbu_sw · September 20, 2019, 11:42am

Thanks for replying. Here is the repository configuration that I tested the restore command just now, which again did not work. The endpoint setting is given in the repo configuration.

curl http://localhost:9200/_snapshot/sl_es_s3_repo_mx?pretty
{
  "sl_es_s3_repo_mx" : {
    "type" : "s3",
    "settings" : {
      "bucket" : "sl-mx-es5-biz",
      "chunk_size" : "500mb",
      "endpoint" : "s3.us-west-1.amazonaws.com",
      "region" : "us-west-1",
      "buffer_size" : "250mb"
    }
  }
}

I have also tried with giving the endpoint in the elasticsearch.yml file and verified that it is being picked up in the logs, as in

grep "using end" shortlyst-in-dev02-2019-09-20-322.log
[2019-09-20T11:47:28,094][DEBUG][o.e.r.s.S3Service        ] [tkHi76j] using endpoint [s3.us-west-1.amazonaws.com]

But the restore does not go through.
Please let me know if you would like to look at any logs and I can share that. I am running with the logs enabled as below.

curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
  "transient": {
    "logger._root":"DEBUG",
    "logger.org.elasticsearch.repositories.s3": "trace",
    "logger.com.amazon": "trace"
  }
}'

ywelsch · September 20, 2019, 12:43pm

Have you correctly configured all nodes in the cluster with this configuration?

Can you share the full logs? In case you don't want to share them publicly, you can e-mail them to "yannick AT elastic DOT co". In particular I'm interested in the log messages that are following the one where the endpoint is successfully set. There should be a log line where it complains about cross-region access of bucket.

[2019-09-20T11:47:28,094][DEBUG][o.e.r.s.S3Service        ] [tkHi76j] using endpoint [s3.us-west-1.amazonaws.com]

subbu_sw · September 20, 2019, 3:01pm

I am running a single node cluster. I have sent the full logs to your email. You can see two attempts of the restore that did not go through, and as you mentioned there is a warning below the endpoint setting in both the runs.

[2019-09-20T14:43:29,572][DEBUG][o.e.r.s.S3Service        ] [tkHi76j] using endpoint [s3.amazonaws.com]
[2019-09-20T14:48:40,936][DEBUG][o.e.r.s.S3Service        ] [tkHi76j] using endpoint [s3.us-west-1.amazonaws.com

ywelsch · September 23, 2019, 12:50pm

I've looked at the logs, but they don't contain any info as to why the restore failed. What is the error you were getting as response to the _restore request? It looks like the shards are not being restored for some reason, can you run the cluster allocation explain API against some of the indices that failed to restore?

subbu_sw · September 24, 2019, 11:04am

Interestingly enough, the restore is working fine since yesterday. I have tried restoring from several different regions, but it is all working now. The warning message below the endpoint setting is still appearing in the log file, but I guess that does not have any implication. I am unable to have a scenario where it is failing to restore so I can run the cluster allocation explain API now. Could this have been caused by any intermittent aws issue? I will anyways test this for the next couple of days and update here.

system · October 22, 2019, 11:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Snapshot and restore elastic search data in some other region aws Elasticsearch	4	1204	December 5, 2016
ElasticSearch S3 Snapshot Retrieval Failing in DR- EC2 Elasticsearch	11	654	April 4, 2018
Elastic search Restore in different cluster Elasticsearch	4	1103	July 5, 2017
Create snapshot in s3 - issue to create snapshot in region eu-west-2 Elasticsearch	5	609	October 24, 2018
Can't access S3 from Elastic Search Elasticsearch	2	557	July 6, 2017

Snapshot restore from S3 bucket in different region not working

Related topics