[Solved] Improving Snapshot Recovery Speed

Nathan_F · September 18, 2015, 7:40am

Hi all,

I have a cluster that I am working with on AWS that is 18TB is size and growing daily. Right now I create daily indices which are backed up to s3. I am looking at consolidating the data from several servers into just a few larger ones and was wondering if others have similar experiences.

During the startup of an entirely new cluster, I am making the following changes (taken from the log):

updating [cluster.routing.allocation.node_initial_primaries_recoveries] from [4] to [50]
updating [indices.recovery.max_bytes_per_sec] from [40mb] to [2gb]
updating [indices.recovery.concurrent_streams] from [3] to [40]

From the _snapshot api:

"daily_backup" : {
    "type" : "s3",
    "settings" : {
      "bucket" : "my-bucket",
      "protocol" : "https",
      "base_path" : "daily",
      "max_restore_bytes_per_sec" : "4096mb",
      "max_snapshot_bytes_per_sec" : "200mb"
    }
  }

I realize that there are probably diminishing returns when setting larger numbers, but I am only seeing about 1GB/min of restoration per machine from s3 backups. The data is being written to raid0 non-EBS drives. Am I possibly missing something that would help speed this up? The AWS servers that I am using to test this must be able to pull s3 data more quickly than that (d2.2xlarge). I will take a look at setting max_num_segments = 1 (and redoing all of my backups) with the hope that this might help overall performance for restoration as well as daily function. Otherwise I would love to hear suggestions. If more information would be helpful, I am happy to oblige.

Note: I made a few changes to snapshot restoration api that allow me to trigger multiple simultaneous snapshot restorations at once. https://github.com/elastic/elasticsearch/pull/12258 (I never touch java so please don't judge what did there too harshly.)

Nathan_F · October 6, 2015, 8:33am

It is always the simpler things isn't it? I have my ES cluster behind a nat in a private subnet on amazon. Changing the nat's instance type to one that supports "high" network performance has quadrupled the speed at least. It looks like that is the only bottleneck.

mikemccand · October 6, 2015, 8:53am

Maybe try removing restore throttling altogether on the restore (set max_restore_bytes_per_sec to 0)?

This recent issue https://github.com/elastic/elasticsearch/pull/13828 means that ES is throttling much more than you requested.

If you do see a speedup, please report back!

Nathan_F · October 7, 2015, 4:02am

Does that setting interact with indices.recovery.max_bytes_per_sec? Should I set both to zero?

mikemccand · October 7, 2015, 7:52am

I think recovery throttling is not affected by the above bug (only restoring a snapshot), so you shouldn't need to set indices.recovery.max_bytes_per_sec to 0 (unless you separately want to!).

Topic		Replies	Views
Speeding up Elasticsearch snapshot restore in ES 5.2 Elasticsearch	2	1021	April 21, 2017
Snapshot & Restore Performance Elasticsearch	1	1533	July 6, 2017
Slow recovery speeds Elasticsearch	1	691	October 27, 2017
[ES v6.7.2] Speed up snapshot restore from GCS Snapshot Repository Elasticsearch snapshot-and-restore	3	773	February 2, 2022
[ES v7.5.1] Speed up snapshot restore from GCS Snapshot Repository Elasticsearch snapshot-and-restore	9	2695	August 18, 2021

[Solved] Improving Snapshot Recovery Speed

Related topics