Unusual number of HEAD requests being made by S3 input plugin

vangap · June 18, 2015, 9:02am

We have S3 access logs being collected in a bucket. We are using S3 input plugin to index these files into ELK.

After a couple of months usage we noticed unusual no of requests made
to S3 (~1 Billion/Month) which costs $440, this is only the charge for
the no of requests which is negligible for most of the use cases, and no
one even bothers about this cost.

When I looked at the billing reports, there were around 950 Million HEAD reqeusts made to the bucket which has these logs.

S3 input plugin must be making all these requests (file watching?)

I am not sure if there is any need to do some kind of optimization on the plugin part.

I think the logs that people store in S3 don't change over time(my
assumption), so if a file is indexed already, then there is no need to
watch that.

From user perspective, the options I can think of, to avoid these requests are

Move the files to different location after the indexing is done, using backup_to_bucket option
Download the files to local drive using a cron job and use file input plugin to index to ES
Use daily prefixes, so that plugin watches only those files, log files are named with timestamps
Change the default interval to something higher if having some delay
is fine, S3 access logs are hourly generated, so there is an hour delay
anyway.

Any opinions and suggestions are welcome.

Thanks

Topic		Replies	Views
S3 input plugin: since_db doesn't work properly: high CPU usage of logstash when more files in folder Logstash	1	926	October 21, 2019
CPU usage of Logstash too high when used with S3 Input plugin Logstash	4	1281	August 15, 2019
S3 input plugin taking really long time to process Logstash	5	569	October 11, 2022
A question around logstash S3 input plugin Logstash	3	426	November 8, 2023
Guidance With S3 Input Plugin Logstash	3	1470	April 30, 2019

Unusual number of HEAD requests being made by S3 input plugin

Related topics