Ideal settings for buffer_size, chunk size, retry_count, max_snapshot_bytes_per_sec for s3 backup

akshaymaniyar · December 30, 2017, 7:52am

Hi,

We have around 150 nodes (Data size:14TB) in our elastic-search cluster and want to take snapshot of our data using an s3 compatible service.

We want to restrict the overall bandwidth being used the by the cluster for snapshot process to 300 MBPS. The only way in which we could do that was find out the number of nodes ('x' nodes) which will actually participate while backing up the cluster and divide 300/'x' and set max_snapshot_bytes_per_sec as this value.

However there is a downside to this setting, during the final stages of snapshot most of the nodes have finished pushing data and only few nodes are remain making the final stage very slow. As an example while taking backup of around 2TB of data (being pushed from 112 nodes), the last 85 gb of data was being pushed from 3-4 machines and took around 6-7 hours to finish, because of the bandwidth limitation set per node.

To avoid such condition, we removed max_snapshot_bytes_per_sec and set buffer_size as 8mb and retry count as 3 with throttle as true and applied a QPS limit of 40 at our s3 compatible service. Post 40 QPS, the service sends a 504 slowdown error code. Now while taking backups, sometimes it happens successfully, however at certain time some shards don't get backed up and in the error message we see 504 slowdown for that particular shard.

What is the best way to take backups in our case with the limitation that we shouldn't use more than 300MBPS of network bandwidth?

akshaymaniyar · January 4, 2018, 4:09pm

BUMP

system · February 1, 2018, 4:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic search S3 backup speed Elasticsearch	3	1207	July 6, 2017
Snapshot throttle limit problem Elasticsearch	1	651	July 6, 2017
Elasticsearch snapshots throttle problems Elasticsearch	3	1721	July 6, 2017
Slow recovery speeds Elasticsearch	1	703	October 27, 2017
[Solved] Improving Snapshot Recovery Speed Elasticsearch	5	4917	July 5, 2017

Ideal settings for buffer_size, chunk size, retry_count, max_snapshot_bytes_per_sec for s3 backup

Related topics