S3 Output Plugin with many S3 buckets takes 30 minutes to start

dorth · June 7, 2019, 3:32am

I'm using Logstash 7.1.1 and logstash-output-s3 4.1.9. I have 38 different output locations (S3 buckets) depending on the logic. Logstash is taking nearly 30 minutes to start.

[2019-06-07T01:21:21,711][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.1.1"}
[2019-06-07T01:50:55,383][INFO ][logstash.javapipeline    ] Starting pipeline {:pipeline_id=>"main", "pipeline.wo...

When I replace all the S3 bucket locations with the file output plugin, it takes about 2 minutes.

[2019-06-07T01:13:56,530][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.1.1"}
[2019-06-07T01:15:42,680][INFO ][logstash.javapipeline    ] Starting pipeline {:pipeline_id=>"main", "pipeline.wo...

I know the s3 plugin is validating that all these buckets actually exist and are writable before startup, but this seems excessively slow. I'm running Logstash in AWS on t2.mediums (2 core / 4GB). Once Logstash is up and running, these servers keep up without breaking a sweat.

If my solution scales to additional buckets and logic, I fear the startup time will be a huge issue when considering autoscaling in addition to being a pain during deployments.

Here's my s3 output configuration. I have 38 different sections and buckets.

s3 {
  region => "us-east-1"
  bucket => "xxxxxx-prod"
  prefix => "%{+YYYY}/%{+MM}/%{+dd}"
  server_side_encryption => true
  server_side_encryption_algorithm => "AES256"
  time_file => 5
  codec => "json"
  canned_acl => "bucket-owner-full-control"
}

dorth · June 12, 2019, 1:40am

In addition to the above, I went ahead and created a VPC endpoint so that access would not traverse the public internet when going to the S3 buckets. I also tried this on a 2-core 8GB (t2.large) instance (instead of a t2.medium). It still is taking the same amount of time to start up.

dorth · June 21, 2019, 8:55pm

Opening this up in GitHub Issue https://github.com/logstash-plugins/logstash-output-s3/issues/208. If you have any followup, please go there.

abcarroll · June 21, 2019, 11:17pm

Hi,..

I am very new to logstash so unfortunately I can't be of much help as to why it takes so long,... However - are you pulling data in or pushing it out to S3?... Your title says input but your text says output.

If you are pulling in data (input), I am currently trying to build a s3-input-proxy, as in having endless issues with s3 and many worker threads,... Haven't even tried scaling it up. Just wondering!

dorth · June 25, 2019, 4:07pm

Thanks for your reply. I am indeed running the OUTPUT plugin but wrote INPUT in the title. I have fixed that now. Sorry I can't be of assistance to you.

system · July 23, 2019, 4:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash w/ s3 output plugin - slow/delay Logstash docker	3	355	June 16, 2023
S3 input taking very long time to start Logstash	1	320	April 30, 2018
S3 input plugin taking really long time to process Logstash	5	570	October 11, 2022
Slow processing from s3 for larger gzip file Logstash	1	622	February 1, 2018
S3 output plugin "upload_worker_count" logs Logstash	1	267	October 3, 2021

S3 Output Plugin with many S3 buckets takes 30 minutes to start

Related topics