Increase Snapshotting Speed while taking a snapshot in Amazon S3 Bucket


(Aditya Jindal) #1

Hi

We are currently having a very large cluster with about 100 TB data in it. We want our snapshot in Amazon S3 Bucket to complete as fast as possible.

With reference from this

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html

&

https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-thread-pool.html

I tried increasing the maximum number of threads in the snapshot thread pool. The number of active threads increased during the Snapshotting process without any increase in the Snapshotting speed.

Currently I am getting 200 MBps Speed of Snapshot/Restore the data onto S3 Bucket

I have checked out that my S3 Bucket has far more limits (Around 2 GBps) than this.

Also I when I am doing a restore of the a snapshot from a Amazon S3, I am able to see threads active from the Generic Thread pool, instead of being active from the Snapshot thread pool.

What could be wrong here?

Thanks in advance!


(David Pilato) #2

There are some internal throttling which applies to all kind of repositories according to:

Basically you can try to change max_snapshot_bytes_per_sec on your S3 repository settings. Something like (untested):

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my_bucket_name",
    "max_snapshot_bytes_per_sec": "100m"
  }
}

This setting is the rate per node as explained in https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

HTH


(Aditya Jindal) #3

Hi @dadoonet,

Thanks for your reply.

I forgot to mention that for my repository settings had the parameters max_snapshot_bytes_per_sec & max_restore_bytes_per_sec as 10gb each.

For further reference my settings are

"my_repo" : {
    "type" : "s3",
    "settings" : {
      "bucket" : "bucket_name",
      "chunk_size" : "1gb",
      "server_side_encryption" : "false",
      "max_restore_bytes_per_sec" : "10gb",
      "buffer_size" : "100mb",
      "base_path" : "/snapshots/",
      "region" : "us-east-1",
      "max_snapshot_bytes_per_sec" : "10gb"
    }
  },

From the same documentation I have tried decreasing the buffer_size so that we can have multiple multipart upload to S3. Couldn't get any significant improvements.

While browsing the code I came across this setting for increasing the number of Concurrent Streams which write to S3.

How can I tweak this setting?

Thanks in advance!


(David Pilato) #4

Sounds like a copy and paste from other repositories javadoc. :slight_smile:

I don't know but @Igor_Motov might know.


(Igor Motov) #5

@dadoonet this setting is a leftover from the initial implementation of snapshot restore in 1.0 when each repository was managing its own thread instead of using the common thread pool. We need to clean this up.

@adityajindal is your cluster located in AWS east as well or it's located outside of the AWS? Could you run snapshot status command two times with 1 min interval between the runs and share the results (you can PM them to me if you don't want to post them publicly).


(Aditya Jindal) #6

Hi @Igor_Motov,

My cluster is located in the same region as the s3 Bucket (us-east-1).

Please find the Output of snapshot status 1 min apart here & here.


(Igor Motov) #7

I can see that it tries to upload on several threads but at the end of the day it looks like it's throttled to about 75mb per second. Did you try to upload a large file to S3 from this machine directly? I wonder what kind of throughput you get if elasticsearch is not involved to rule out AWS network throttling. Where do you have your data stored? What type of instance is it?

By the way, @tlrx please correct me if I am wrong, but I don't think reducing buffer_size would increase performance since because of sequential nature of uploads. If anything it should probably decrease it because of additional request overhead.


(Aditya Jindal) #8

Hi

I have tried uploading a large file to S3 from this machine directly.

The throughput I am getting is around 550 MBytes Per second for a large file. I was able to open 350 Connections. Ref: Here

I am using i3.16xlarge instance type to store my data.

From this documentation buffer_size is the largest size for Single Part Upload. Assuming Multipart Upload gives better throughput, shouldn't decreasing this setting increase the number of Multipart Uploads and hence better throughput ?

Thanks in Advance!


(Tanguy) #9

That's also what I think. To execute a snapshot on S3, the snapshot service lists the shard's files to save and compares the length of the file with the chunk_size parameter (1Gb for S3 repositories by default). If the file is larger than 1Gb then it will be split into multiple chunks. If the file is lower than 1Gb there will be 1 chunk. Then the snapshot service starts to bufferize the file and if the buffer is full it sends the file in multiple uploads (2 requests + 1 request per full buffer). If the buffer is not full the file is sent using a single request.

A too low buffer_size can increase the number of requests, to the point where S3 limits are exceeded. But a low buffer_size is helpful on unstable networks because only small parts of file will have to be re sent if the network dropped or the request failed. The best advise is to experiment because it depends of the network and the indices. Having the same values for chunk_size and buffer_size can also help to find the right settings.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.