[ES v6.7.2] Speed up snapshot restore from GCS Snapshot Repository

Akshay_KN · January 3, 2022, 11:26am

Hello Team,

We are trying to speed up our snapshot restoration speed in our ES cluster hosted on GCP Compute instances.

TL;DR :

Current Performance: 50 MBps per data node
Infra Capable of 500 MBps
We want to improve our restore speed up to the maximum disk throughput (no throttling from infrastructure).

Infra Details :

5 Master | 7 Data | 2 Co-ordinator Nodes
Each node has 16core/32gb config (heap size: 16gb)
Each Data instance supports max 25k disk IOPS (500 MBps Throughput)

Current Restore Performance: 3 Gbps on the whole cluster (50 MBps per node).

We are currently getting 10% of the total disk write throughput. We are looking for options to improve it.

What we have already tried:

cluster.routing.allocation.node_concurrent_recoveries: 30
cluster.routing.allocation.node_initial_primaries_recoveries: 30
indices.recovery.max_bytes_per_sec: 20gb
indices.recovery.max_concurrent_file_chunks: 5
thread_pool.bulk.queue_size: 2000
thread_pool.bulk.size: 16
thread_pool.index.queue_size: 2000
thread_pool.index.size: 16
thread_pool.snapshot.core: 10
thread_pool.snapshot.max: 50
transport.connections_per_node.recovery: 10

Tested the restoration speed on the below ES versions

6.8.12 - No improvement in the speed.
7.10.2 - Massive improvement in the speed. 28Gbps speed for 7 data node cluster.

We are unable to figure out the config that is throttling the network performance.

DavidTurner · January 3, 2022, 4:04pm

This is the best solution IMO: upgrade to a newer version. 6.7 is nearly 3 years old and well over a year past EOL, so it's no longer supported and you're missing out on several years of performance improvements by using such an ancient version.

You should restore these values back to the defaults. The values you suggest can make performance worse and can even lead to cluster instability. Increasing indices.recovery.max_bytes_per_sec is acceptable but the value you use should not exceed your actual disk throughput.

I forget the details of snapshotting in such old versions but in newer versions the repository setting max_restore_bytes_per_sec defaults to 40mb so it might help to increase this too. Even if it does help, you should still upgrade as a matter of some urgency.

Akshay_KN · January 5, 2022, 5:19am

Thank you for the updates.

We will restore the values back to the defaults.

system · February 2, 2022, 5:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[ES v7.5.1] Speed up snapshot restore from GCS Snapshot Repository Elasticsearch snapshot-and-restore	9	2641	August 18, 2021
Speeding up Elasticsearch snapshot restore in ES 5.2 Elasticsearch	2	1017	April 21, 2017
[Solved] Improving Snapshot Recovery Speed Elasticsearch	5	4752	July 5, 2017
Snapshot & Restore Performance Elasticsearch	1	1527	July 6, 2017
Fastest way to restore ElasticSearch snapshot Elasticsearch	3	530	July 20, 2020

[ES v6.7.2] Speed up snapshot restore from GCS Snapshot Repository

Related topics