Elastisearch Backups Are Failing On s3

Hi,

We are taking elasticsearch backup using S3 repository plugin , Here is the repository setting and and create snap shot command

Register Repo :
curl -XPUT 'http://localhost:9200/_snapshot/repo_name' -d '{
"type": "s3",
"settings": {
"bucket": "bucket_name",
"base_path": "F1xL-cluster_name",
"compress" : true,
"chunk_size": "10m",
"max_snapshot_bytes_per_sec" : "40mb",
"max_restore_bytes_per_sec" : "400mb",
"concurrent_streams" : "20",
"buffer_size" : "5mb",
"max_retries": 3
}
}'

Create snap
curl -XPUT 'http://localhost:9200/_snapshot/repo_name/cluster_name-20160421?wait_for_completion=false'

Once we trigger the snapshot after some time it failing with following errors

  "node_id" : "v5gjJl-sRySq9k7dEZj0mw",
  "index" : "monitoring_20160421",
  "reason" : "IndexShardSnapshotFailedException[[monitoring_20160421][0] Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: 0A80808C369128BE)]; nested: AmazonS3Exception[Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: 0A80808C369128BE)]; ",
  "shard_id" : 0,
  "status" : "INTERNAL_SERVER_ERROR"
} ],

Can some one please help here , How to check request rate ? and how to resolve this issue

Thanks

We have received that error message from S3 before. If you have a support contract with AWS you can talk to your rep to ask them to manually partition your bucket. After they've done that they can support much higher PUT rates.
See http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html.

However for elasticsearch backups, I think you may just want to reduce the concurrent_streams number. We were able to use S3 snapshots using the default settings without any issue.

Hi , cdurbin

Thanks for helping ,

We do connected with aws team and requested to increase the partition , and they increased it but still we see same error "Please reduce your request rate" .

Is there a way to measure the request limit ?

I don't know of a way to monitor your request rate using AWS tools. Something doesn't sound right though. After having our bucket manually partitioned we were writing 10,000 objects per second across 700 concurrent connections sustained for 12 hours. Our rep told us we could achieve up to 50,000 transactions per second after the partitioning.

I don't know much about the cloud-aws elasticsearch plugin. We created a new bucket (not partitioned for high performance) and used default settings for all of the parameters except we changed max_retries to 5. With those settings (and an 11 node cluster with 1 TB of data) we have not seen any rate limit issues from snapshots.

My recommendation would be to create a new snapshot repo using the default settings and test.