Hi .. i am migrating to aws managed elasticsearch from self hosted es.. as part of migration facing an issue with backup(snapshot) and restore.
we have 2 distinct clusters preferably us-east and us-west , snapshot repository is created for both clusters using same s3 bucket(us-east). now i am able to create a manual snapshot in east cluster and that snapshot is not visible in west cluster even though same s3 bucket.
Please share your thoughts on this. Thanks in advance.!!!
You shouldn't use the same snapshot repository for snapshots from multiple clusters. From the docs:
If you register the same snapshot repository with multiple clusters, only one cluster should have write access to the repository. All other clusters connected to that repository should set the repository to readonly mode.
Aside from that, you'll probably need to speak with AWS support about your questions, since they use a nonstandard build of Elasticsearch and look after the infrastructure themselves and don't usually answer any questions on these forums. If you want help here you're best off using Elastic Cloud: Hosted Elasticsearch, Hosted Search | Elastic instead.
Thanks for taking some time to answer my question, yes i understand ., but one thing.... i was saying repositories are created per cluster in the sense created a snapshot repository for east region cluster and another repo created for west region cluster but the underlying source s3 bucket is same.
problem is i can see snapshots in s3 bucket like "meta-.dat" and "snap-.dat" files but if i call "_snapshot/repo/_all" api in kibana i can see these snapshots from where these got created(ex:east cluster) but not visible from the west cluster even though s3 bucket is same for both repositories.
is there any some tools or tips to dig deeper into snapshot repo with s3 bucket , like the relation between repo and s3 because i can see objects(snapshots) in s3 bucket.. but not able find in repository(ex : response in "_snapshot/repo/_all").
How are the repos different then? Did you set base_path differently, for instance? If they are separate repositories then did you register both of them on each cluster, using read_only: true to prevent writes to the wrong repo?
Current architecture: ES is installed on ec2 instances by our self where we can do registered snapshot repo on both clusters differently, in this case both repositories doesn't have "read-only" parameter. But which is working well.
Proposed architecture: ES is managed by AWS where we have 2 clusters(east, west) and 2 distinct repositories on each one with same s3(base_path is same).
with your point "read_only" set to true this is working we can see snapshots in the repo.
whole picture is get data replicated in 2 clusters... if i created a doc in east it should is be replicated in west cluster for this we are using index snapshots from s3 .
frequently changing repo settings with "read_only" set true and false is gonna be hectic because we use sometimes east to write data and sometimes west to write data(snapshots).
Yes, that's because you're not supposed to have more than one cluster writing to each repository. If you have two clusters you need two separate repositories.
This is working fine when we have self hosted where we don't' have "read_only" parameter in repo's settings.
Tried using 2 different s3 buckets(east, west) at regions with respective ES cluster's in AWS managed versions,means East cluster repo is registered with East S3 bucket and West cluster repo is registered with West S3 bucket, even though this is not working still not able to see snapshots those created on other region (note: both region buckets have configured with cross region bi-directional replication).
Then you definitely need AWS's help: this indicates that the problem lies either in their nonstandard infrastructure or in the differences between the official Elasticsearch and the AWS fork. Either way, they keep the details secret.
Thanks a lot for bearing with me. at least had some knowledge.
finally , how to dig into snapshot repo with underlying source (ex: s3 bucket). means objects(snapshots) there in s3 bucket but not able to see in repository.
how to analyze this problem like from the repository only I can fire api calls "_snapshot/repo-name/_all" ,any other things to look into this deeper.?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.