We have S3 access logs being collected in a bucket. We are using S3 input plugin to index these files into ELK.
After a couple of months usage we noticed unusual no of requests made
to S3 (~1 Billion/Month) which costs $440, this is only the charge for
the no of requests which is negligible for most of the use cases, and no
one even bothers about this cost.
When I looked at the billing reports, there were around 950 Million HEAD reqeusts made to the bucket which has these logs.
S3 input plugin must be making all these requests (file watching?)
I am not sure if there is any need to do some kind of optimization on the plugin part.
I think the logs that people store in S3 don't change over time(my
assumption), so if a file is indexed already, then there is no need to
From user perspective, the options I can think of, to avoid these requests are
Move the files to different location after the indexing is done, using backup_to_bucket option
Download the files to local drive using a cron job and use file input plugin to index to ES
Use daily prefixes, so that plugin watches only those files, log files are named with timestamps
Change the default interval to something higher if having some delay
is fine, S3 access logs are hourly generated, so there is an hour delay
Any opinions and suggestions are welcome.