Which repository do you use to snapshot & restore large datasets?


(Igor Kupczyński) #1

Hi,

We want to introduce snapshot and, in case of failure, restore to our ES setup.

In each of our clusters we have over a few TBs of primary data (*1-2 depending on replication factor). We experiment with google cloud storage in S3 compatibility mode as a repository, but the recovery rates are abysmally slow. Looks like we are being throttled by them.

I wonder if you do snapshot and restore on large data sets. What is the size of your data? Which repository do you use? What rates do you have? Are you happy with the solution? Just share your experience :slight_smile:

Cheers,
Igor


(Mark Walkom) #2

S&R doesn't take a copy of the replicas, just the primaries.


(Igor Kupczyński) #3

I know :-), but my question was more about which repos do you us,
what is your dataset size and what is your experience with S&R.

I think I'll be a good reference point for us on what is possible.

Thanks,
Igor


(Mark Walkom) #4

We see a lot of people use S3 if they are already in AWS, they can then potentially leverage Glacier for long term storage :slight_smile:


(system) #5