We are trying to speed up our snapshot restoration speed in our ES cluster hosted on GCP Compute instances.
- Current Performance: 56 MBps per data node
- Infra Capable of 500 MBps
- We want to improve our restore speed up to the maximum disk throughput (no throttling from infrastructure).
- 3 Master | 10 Data Nodes
- Each node has 8core/16gb config (heap size: 8gb)
- Each Data instance supports max 15k disk IOPS (500 MBps Throughput)
Current Restore Performance: 4.5 Gbps on the whole cluster (56 MBps per node).
We are currently getting 10% of the total disk write throughput. We are looking for options to improve it.
Note: Have already tested any infra-related throttling. Using
gsutil -m, we saw the download speed reach to 450 MBps on one of the data nodes.
What we have already tried:
0in our gcs snapshot repository.
We are unable to figure out the config that is throttling the network performance.
We tried to increase the number of data nodes, to check if the throttling is on some data nodes:
Changed Data nodes count from 10 to 20.
Result: Speed still throttled at 4.5 Gbps.