Ingesting High Volume of AWS Flowlogs

sam6 · July 14, 2020, 2:59pm

Hi,

Im currently using the Logstash S3 input module to ingest flowlog data from an S3 bucket however it isn't pulling in enough data quick enough and so falling behind. I've tried upping the max batch size to 20000 in logstash.yml and also set the S3 input interval to be 2 seconds to no avail.

I can't see a way to create multiple pipelines without potential for duplication - I do have the input set to move the processed objects to another bucket but another pipeline could still potentially read the same object at the same time.

Basically just looking for some advice on best approach to this as im about to start writing something that will pull in multiple S3 objects and create local logfiles on the logstash server to ingest. Any suggestions would be greatly appreciated!

sam6 · July 15, 2020, 10:16am

Just incase anyone runs into similar issues it looks like my issue may have been due to multiple inputs being executed in the same pipeline causing a delay between the execution of the s3 input, i also disabled watch_for_new_files. Seems to be processing a lot faster now although its still quite close.

system · August 12, 2020, 10:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Suggestions for S3 input shortcomings, buffering & durability, & redis Logstash	1	482	July 20, 2019
Logstash fine tuning for ingesting more events (s3 input) Logstash	4	614	May 30, 2022
High Availibility for Logstash Input Processing Logstash	2	415	April 10, 2018
S3 input stack trace and multi-pipeline config question Logstash	1	548	December 21, 2018
A question around logstash S3 input plugin Logstash	3	463	November 8, 2023

Ingesting High Volume of AWS Flowlogs

Related topics