Logstash fine tuning for ingesting more events (s3 input)

antonisnyc94 · May 1, 2022, 6:59pm

Hello,

We sending events for vpc flowlogs from multiple AWS accounts into a central s3 bucket and due to the large number of events we are always 5-6 days behind in Elasticsearch. I already set the batch.size to 6000 and batch.delay to 1 without any increase in the readability of events. Do you have any suggestions on how to increase the read number of events from an s3 bucket??

Thanks,
Tony

leandrojmp · May 1, 2022, 9:46pm

Does the VPC flow logs creates a large number of files like the Cloudtrail logs?

If this is the case, there is not much you can do, the listing process is very slow when you have a large number of files in the bucket, there is an open issues about it, but no updates.

I had a similar issue with logs from Cloudtrail which I was able to at least make the s3 input usable by setting the prefix in the input, but since the prefix can not be changed dynamically I'm using an external tool to edit the logstash configuration file daily.

The issue is that the s3 input will list everything in the bucket, and if you have million of files in the bucket this could take a long time.

If the VPC flowlogs have an structure similar to the Cloudtrail logs, you could try to use the prefix to reduce the number of objects to list, but again, you would need to use some external tool or script to change the prefix.

antonisnyc94 · May 1, 2022, 10:02pm

Hey @leandrojmp ,

Thank you so much for you reply! There's a large number of files for vpcflowlogs and this is what causes the issue.. I wish there was a setting in the S3 input to either read files randomly or by the latest date modified. This way i could have multiple logstash instances running and hence increasing the ingestion data. Thank you for your suggestion tho but the change of prefix wont work for us.

Best,
Tony

ibra_013 · May 2, 2022, 7:48pm

Hi @leandrojmp

adding SNS between S3 bucket and ELK , could help ?

system · May 30, 2022, 7:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingesting High Volume of AWS Flowlogs Logstash	2	279	August 12, 2020
Duplicate logs - logstash-input-s3? Logstash	9	3401	July 6, 2017
Suggestions for S3 input shortcomings, buffering & durability, & redis Logstash	1	482	July 20, 2019
Logstash S3 input slow ingestion Logstash docker	6	559	April 11, 2023
Guidance With S3 Input Plugin Logstash	3	1493	April 30, 2019

Logstash fine tuning for ingesting more events (s3 input)

Related topics