How to calculate no. of request going to s3 while snapshot

Hi
We are running an elasticsearch cluster with 40 data nodes and 3 master nodes
data nodes are i3.4xlarge and master nodes are m5.xlarge, using es version 5.4.0
We are taking daily backup on s3 using snapshot command , here is the repository settings

"type": "s3",
"settings": {
	"bucket" : "bucket",
	"chunk_size" : "100mb",
	"server_side_encryption" : "true",
	"max_restore_bytes_per_sec" : "10gb",
	"use_throttle_retries" : "true",
	"max_retries" : "20",
	"compress" : "true",
	"base_path" : "AA-cluster"
}

While taking snapshot getting this error
IndexShardSnapshotFailedException[Failed to write file list]; nested: IOException[com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service: Amazon S3; Status Code: 200; Error Code: SlowDown; Request ID: 9F00739FB2DEEAD2), S3 Extended Request ID: zFhvAAxsfAKBmkTWtRb+qz/bs3/wGQ5H7Vduc8rwRhWeOAvPApZVkZpvOvD3bt78y7m8WB/4eOs=]; nested: AmazonS3Exception[Please reduce your request rate. (Service: Amazon S3; Status Code: 200; Error Code: SlowDown; Request ID: 9F00739FB2DEEAD2)]; ",

We checked with aws they wanted to how many request we are making to s3
The request rate is 3,500 requests per second, they want to know whether we are crossing this limit
Any one has any idea how to calculate the no of request we are making while snapshot
The cluster size is 93tb

Thanks
Sukanta

1 Like

The number of requests is largely determined by the number of segments that have changed since the last snapshot, and the number of shards that are being snapshotted. Each segment to be uploaded is split into chunks and each chunk corresponds with at least one request. I notice that you've reduced the chunk_size from the default of 1gb down to 100mb, which will increase the number of requests on larger segments by a factor of 10. I would recommend changing this back to the default. You don't say how many shards you have in your cluster, but as a general rule fewer, larger, shards will take fewer requests to snapshot.

Elasticsearch does not currently limit its request rate to S3, but that sounds like a feature that might be possible to implement. Please open a feature request on Github if you think this would be useful.

2 Likes

Hi,

To hopefully add to David's suggestion:

How many snapshots are in that repo?
How many objects are in the bucket/prefix (the one used by this repo)?
How many indices/shards are in that cluster?
Do you force merge to N before doing snapshots, do you use force merge at all?

It would be interesting to see if you would get the same results in a newer version too with the version bump to the AWS SDK and underlying code making the S3 repository work. I believe much work went into this since 5.4 and the aws SDK also went up.
You don't have a dev cluster with 93TB of data by any luck?

5.4.0 is old though, do you have a plan to move up to 5.6 than 6.8?

Did you try setting a huge(maybe huge huge) number in max_retries? Maybe 20 is simply not enough to successfully build the file(blob) list with a such a huge snapshot in the same repo(bucket). You could also try without throttled retries, did you?

But yeah, based on my current knowledge of S3 app architecture, this should be considered an application side bug or mis configuration, in the sense that you should be able to manage PBs in a single S3 bucket with billions of objects and the client code should always be able to retry until it gets done if the errors are transient and not fatal; throttling, 5XX, back pressure, etc.

Even if you reduce your object count by 10 or any other factor, someone else could have a snapshot that's simply bigger by that same factor which would bring him back to the same level.
It's not an S3 problem to have billions of small objects in a bucket/prefix, it's just cost more money than if you'd store the same in less objects. It's use case dependent.

If the s3 repository plugin doesn't retry until it succeed with correct exponential backoff policies when doing atomic operations then you're in a pickle or will be at some point in the future. Even more if you can't set or control further the AWS SDK S3 client that is used under the hood to customize its retry/throttling behavior for your big use case.

Looking at the code this exception would appear(not an expert on that) to come from code that build a representation of what is in the bucket and not the part that actually puts the chunks/blobs which to me make sense since 3500req/s X 100MB/req = 340GB/s. Not saying you can't theoretically push that in S3 and make 3500 PUT of 100MB per second but you definitively can't do that with 40 i3.4xlarge capped at 1.25GB/s of network throughput... IF you are being throttled while PUTing the chunks it's not at 3500req/s of 100MB each for SURE. But I believe S3 can return this at much lower rate while it adjust the S3 backend to scale up to your required request rate for a prefix.

Which would also point to the application side not tolerating the throttling correctly and fatally giving up instead of adjusting. Profiling the request rate would indeed shed light on this part.
Maybe it's Listing/GETing/HEADing to sync it's local representation of the bucket and that's when the throttling triggers enough retries to make the plugin abandon.

Maybe @DavidTurner knows for sure if part of the operation actually consist in building/syncing such a local representation of the bucket content, most S3 apps need to do that but I'm not familiar enough with the s3-repository logic to know for sure.

For your specific question to help in your investigation of the rate produced by the plugin:

Metrics approach:

  1. How Do I Configure Request Metrics for an S3 Bucket?
  2. If you have many things in that same bucket which are unrelated. How Do I Configure a Request Metrics Filter?
  3. Concept and metric description
  4. Concept continued and config

Logs approach (then count/aggregate/analyse those logs to derive requests rate):

  1. Server Access Logging delivered to S3
  2. Cloudtrail (with the S3 data event logging enabled)

So for cloudtrail it would mean enabling S3 data events than sending cloudtrail to an S3 and then either downloading this for analysis or stuffing it into something like ES or querying it with Athena or something else to derive the metrics you're interested in.
Could also send cloudtrail to cloudwatch logs to derive metrics from logs via either cloudwatch logs metric filter or cloudwatch logs insight.

For server access logging it means enabling them for the bucket and then downloading them/stuffing them in a tool for analysis like ES or Athena or something else.

I presented the options in the order I recommend you try them (from easiest to hardest).
I would also recommend you ask AWS support how you should obtain the numbers they are asking for... Maybe they get can it from their backend without you doing anything... or maybe they have a suggestion I didn't mention/know about?

Other possibilities could involve routing all the request from the s3-repository plugin through a proxy like squid with access logs enabled and then doing the log analysis like above but on those logs...
Maybe ES can be configured to log in debug in a way that this plugin will configure the AWS SDK it uses to log every requests it makes to S3 in an ES log file on each nodes. I don't know if that's possible or how to do it if it is. Require someone else to enlighten you on that. Then, gather and analyse like above.

Thanks David and Martinr for your help

David I will try to use chunk_size 10gb and see if it helps

Martinr Here I tried to answer your question

How many snapshots are in that repo?

We keep last 3 days snapshots , remaining we delete

How many objects are in the bucket/prefix (the one used by this repo)?

This bucket has 256 hexadecimal partitions on it, for example
spr-prod-es-backup-group4/[0-9A-F][0-9A-F]
spr-prod-es-backup-group4-dr/[0-9A-F][0-9A-F]

So far we have 21 partitions created like this
0B
0C
0D
0E
0F
1B
1C
1D
1E
1A
0A
6A
2A
3A
4A
5A
7A
1F
8A
9A
AA

How many indices/shards are in that cluster?

No. Of Indices : 551
active_primary_shards : 4401
Doc Count : 122849007783

Do you force merge to N before doing snapshots, do you use force merge at all?

No we do not force merge

And we are using repository-s3-5.4.3-SNAPSHOT.zip

By making chunk size 1gb it improves a lot

Now the settings is follows

{
"spr-prod-es-backup-group4" : {
"type" : "s3",
"settings" : {
"bucket" : "spr-prod-es-backup-group4",
"chunk_size" : "1gb",
"server_side_encryption" : "true",
"max_restore_bytes_per_sec" : "10gb",
"use_throttle_retries" : "false",
"max_retries" : "100",
"compress" : "true",
"buffer_size" : "1gb",
"base_path" : "AA-listening-tw-firehose4"
}
}
}

We kept chunk size and buffer size same because in doc it said
When the chunk size is greater than the value of the buffer_size, Elasticsearch will split it into buffer_size fragments and use the AWS multipart API to send it

Can you suggest is this settings is correct
Also now we are seeing a different error while snapshoting

"state" : "PARTIAL",
"start_time" : "2019-06-26T04:15:20.882Z",
"start_time_in_millis" : 1561522520882,
"end_time" : "2019-06-26T04:30:39.165Z",
"end_time_in_millis" : 1561523439165,
"duration_in_millis" : 918283,
"failures" : [
{
"index" : "tweet_p999999_v_2_20190430_1345",
"index_uuid" : "tweet_p999999_v_2_20190430_1345",
"shard_id" : 4,
"reason" : "IndexShardSnapshotFailedException[Failed to write file list]; nested: NoSuchFileException[Blob [pending-index-112] does not exist]; ",
"node_id" : "R7uaWsplQNmrOiqCEHQD4w",
"status" : "INTERNAL_SERVER_ERROR"
}
]

Any one has any idea why this error is coming

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.