Ingesting High Volume of AWS Flowlogs


Im currently using the Logstash S3 input module to ingest flowlog data from an S3 bucket however it isn't pulling in enough data quick enough and so falling behind. I've tried upping the max batch size to 20000 in logstash.yml and also set the S3 input interval to be 2 seconds to no avail.

I can't see a way to create multiple pipelines without potential for duplication - I do have the input set to move the processed objects to another bucket but another pipeline could still potentially read the same object at the same time.

Basically just looking for some advice on best approach to this as im about to start writing something that will pull in multiple S3 objects and create local logfiles on the logstash server to ingest. Any suggestions would be greatly appreciated!

Just incase anyone runs into similar issues it looks like my issue may have been due to multiple inputs being executed in the same pipeline causing a delay between the execution of the s3 input, i also disabled watch_for_new_files. Seems to be processing a lot faster now although its still quite close.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.