Optimizing Elasticsearch Snapshot Performance to S3-Compatible Object Storage

Hello,

I’m currently conducting performance testing for Elasticsearch snapshots targeting an S3-compatible object storage system.

System Configuration:

  • Elasticsearch Cluster: 2 nodes (master + data)
  • CPU: 8 cores
  • RAM: 32 GB
  • Storage: 11 TB SSD
  • Network: 100 Gbps

During a snapshot operation involving approximately 100 GB of data, I observed the following:

  • CPU Usage: 0–5%
  • Memory Usage: Peaked at 18%
  • Network Throughput: Averaged around 40 MB/s

Despite the high-speed network, the throughput seems relatively low. I used the default Elasticsearch snapshot settings for this test.

Questions:

  1. Is this level of performance considered optimal for such a setup?
  2. Are there recommended configuration changes—either at the repository or cluster level—that could improve snapshot throughput?
  3. Is there any official best practices guide or benchmark documentation from Elastic that I can refer to for performance tuning and comparison?

Any insights or references would be greatly appreciated.

You can tune the parameters governing snapshot creation. By default I believe it is throttled in order to ensure it does not have a huge negative impact on cluster performance when running.

Thanks, Any suggestion around the parameters ?

See Snapshot and restore settings | Reference

Maybe snapshot.max_concurrent_operations ?

Also how big are your shards?

Have you tried playing with:

chunk_size
repository.buffer_size
max_snapshot_bytes_per_sec
max_restore_bytes_per_sec

Let us know how it goes.

1 Like