I tried increasing the maximum number of threads in the snapshot thread pool. The number of active threads increased during the Snapshotting process without any increase in the Snapshotting speed.
Currently I am getting 200 MBps Speed of Snapshot/Restore the data onto S3 Bucket
I have checked out that my S3 Bucket has far more limits (Around 2 GBps) than this.
Also I when I am doing a restore of the a snapshot from a Amazon S3, I am able to see threads active from the Generic Thread pool, instead of being active from the Snapshot thread pool.
From the same documentation I have tried decreasing the buffer_size so that we can have multiple multipart upload to S3. Couldn't get any significant improvements.
While browsing the code I came across this setting for increasing the number of Concurrent Streams which write to S3.
@dadoonet this setting is a leftover from the initial implementation of snapshot restore in 1.0 when each repository was managing its own thread instead of using the common thread pool. We need to clean this up.
@adityajindal is your cluster located in AWS east as well or it's located outside of the AWS? Could you run snapshot status command two times with 1 min interval between the runs and share the results (you can PM them to me if you don't want to post them publicly).
I can see that it tries to upload on several threads but at the end of the day it looks like it's throttled to about 75mb per second. Did you try to upload a large file to S3 from this machine directly? I wonder what kind of throughput you get if elasticsearch is not involved to rule out AWS network throttling. Where do you have your data stored? What type of instance is it?
By the way, @tlrx please correct me if I am wrong, but I don't think reducing buffer_size would increase performance since because of sequential nature of uploads. If anything it should probably decrease it because of additional request overhead.
I have tried uploading a large file to S3 from this machine directly.
The throughput I am getting is around 550 MBytes Per second for a large file. I was able to open 350 Connections. Ref: Here
I am using i3.16xlarge instance type to store my data.
From this documentation buffer_size is the largest size for Single Part Upload. Assuming Multipart Upload gives better throughput, shouldn't decreasing this setting increase the number of Multipart Uploads and hence better throughput ?
That's also what I think. To execute a snapshot on S3, the snapshot service lists the shard's files to save and compares the length of the file with the chunk_size parameter (1Gb for S3 repositories by default). If the file is larger than 1Gb then it will be split into multiple chunks. If the file is lower than 1Gb there will be 1 chunk. Then the snapshot service starts to bufferize the file and if the buffer is full it sends the file in multiple uploads (2 requests + 1 request per full buffer). If the buffer is not full the file is sent using a single request.
A too low buffer_size can increase the number of requests, to the point where S3 limits are exceeded. But a low buffer_size is helpful on unstable networks because only small parts of file will have to be re sent if the network dropped or the request failed. The best advise is to experiment because it depends of the network and the indices. Having the same values for chunk_size and buffer_size can also help to find the right settings.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.