S3 Output Plugin with many S3 buckets takes 30 minutes to start

I'm using Logstash 7.1.1 and logstash-output-s3 4.1.9. I have 38 different output locations (S3 buckets) depending on the logic. Logstash is taking nearly 30 minutes to start.

[2019-06-07T01:21:21,711][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.1.1"}
[2019-06-07T01:50:55,383][INFO ][logstash.javapipeline    ] Starting pipeline {:pipeline_id=>"main", "pipeline.wo...

When I replace all the S3 bucket locations with the file output plugin, it takes about 2 minutes.

[2019-06-07T01:13:56,530][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.1.1"}
[2019-06-07T01:15:42,680][INFO ][logstash.javapipeline    ] Starting pipeline {:pipeline_id=>"main", "pipeline.wo...

I know the s3 plugin is validating that all these buckets actually exist and are writable before startup, but this seems excessively slow. I'm running Logstash in AWS on t2.mediums (2 core / 4GB). Once Logstash is up and running, these servers keep up without breaking a sweat.

If my solution scales to additional buckets and logic, I fear the startup time will be a huge issue when considering autoscaling in addition to being a pain during deployments.

Here's my s3 output configuration. I have 38 different sections and buckets.

s3 {
  region => "us-east-1"
  bucket => "xxxxxx-prod"
  prefix => "%{+YYYY}/%{+MM}/%{+dd}"
  server_side_encryption => true
  server_side_encryption_algorithm => "AES256"
  time_file => 5
  codec => "json"
  canned_acl => "bucket-owner-full-control"
}

In addition to the above, I went ahead and created a VPC endpoint so that access would not traverse the public internet when going to the S3 buckets. I also tried this on a 2-core 8GB (t2.large) instance (instead of a t2.medium). It still is taking the same amount of time to start up.

Opening this up in GitHub Issue https://github.com/logstash-plugins/logstash-output-s3/issues/208. If you have any followup, please go there.

Hi,..

I am very new to logstash so unfortunately I can't be of much help as to why it takes so long,... However - are you pulling data in or pushing it out to S3?... Your title says input but your text says output.

If you are pulling in data (input), I am currently trying to build a s3-input-proxy, as in having endless issues with s3 and many worker threads,... Haven't even tried scaling it up. Just wondering!

Thanks for your reply. I am indeed running the OUTPUT plugin but wrote INPUT in the title. I have fixed that now. Sorry I can't be of assistance to you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.