Constant logstash s3 input plugin restart due to failing to open TCP connection to the target s3 bucket

Background:

We are using logstash s3 input plugin to ingest the logs from s3 bucket in AWS, however, we observed in the logstash plain logs, there are constant plugin errors that lead to the restart of the plugin. And the errors appeared to be caused by failed TCP connection.

Each time the plugin restarts, it seems that it will re-iterate the objects in the s3 bucket over again before processing the logs. This seems to contribute a lagging to ingest those logs

The logstash config is working and ingesting logs.

The same errors happened to different s3 buckets and there are average ~43times of the error each day.

Details

We tried below,

  1. Noticed that the error message "Error: Failed to open TCP connection to bucketA.s3.eu-central-1.amazonaws.com:443 (initialize: name or service not known)", we examined the logs from TCPdump, the DNS resolution succeeded and returns a valid IP

  2. Increased the JVM Heap Size from 1gb to 4gb

  3. Tried to reduced the folders that needed to be ingested, it reduced the time taken to iterate the objects in s3 therefore reduced the lagging, but the plugin restart error still exists.

Error Message Sample

Masked some of the info, such as the pipeline ID, bucket name etc.

[2023-01-06T00:10:34,014][ERROR][logstash.javapipeline    ][pipeline A][pipeline ID A] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id: pipeline A
  Plugin: <LogStash::Inputs::S3 bucket=>"bucket A", include_object_properties=>true, prefix=>"Prefix A", id=>"pipeline ID A", region=>"eu-central-1", type=>"A-log", sincedb_path=>"/var/lib/logstash/plugins/inputs/s3/sincedb_A", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_A", enable_metric=>true, charset=>"UTF-8">, role_session_name=>"logstash", delete=>false, interval=>60, watch_for_new_files=>true, temporary_directory=>"/tmp/logstash", gzip_pattern=>".gz(ip)?$">
  Error: Failed to open TCP connection to bucketA.s3.eu-central-1.amazonaws.com:443 (initialize: name or service not known)
  Exception: Seahorse::Client::NetworkingError
  Stack: /usr/share/logstash/vendor/jruby/lib/ruby/stdlib/net/http.rb:943:in `block in connect'
org/jruby/ext/timeout/Timeout.java:114:in `timeout'
org/jruby/ext/timeout/Timeout.java:90:in `timeout'
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/net/http.rb:939:in `connect'
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/net/http.rb:924:in `do_start'
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/net/http.rb:919:in `start'
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/delegate.rb:83:in `method_missing'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/seahorse/client/net_http/connection_pool.rb:285:in `start_session'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/seahorse/client/net_http/connection_pool.rb:92:in `session_for'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/seahorse/client/net_http/handler.rb:119:in `session'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/seahorse/client/net_http/handler.rb:71:in `transmit'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/seahorse/client/net_http/handler.rb:45:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/seahorse/client/plugins/content_length.rb:12:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/aws-sdk-core/plugins/s3_request_signer.rb:88:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/aws-sdk-core/plugins/s3_request_signer.rb:23:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/aws-sdk-core/plugins/s3_host_id.rb:14:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/aws-sdk-core/xml/error_handler.rb:8:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/aws-sdk-core/plugins/s3_request_signer.rb:65:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/aws-sdk-core/plugins/s3_redirects.rb:15:in `call'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/aws-sdk-core-2.11.632/lib/aws-sdk-core/plugins/retry_errors.rb:108:in `call'

Not sure anyone ran into the same issues / would know about how to fix it.

Did your buckets have a lot of files? If I'm not wrong the s3 input plugin lists all the objects in the bucket every time it runs, this is one of the things that make it slow some times.

There are a couple of stalled issues to improve the performance when working with buckets with a lot of objects, but no progress or change yet.

My experience using the s3 input with buckets that have a lot of objects, like cloudtrail buckets, was pretty bad, I had some similar issues as yours and eventually gave up and wrote a small python script to collect the logs and put on a folder for logstash to process.

Hi @leandrojmp , Thanks for the reply.

One of the target bucket is significantly large, and the time taken to iterate the object before processing the logs is long sometime, it could be 10 - 30 mins and we mitigated it by specifying the logstash to only ingest certain folders in the bucket.

However, the plugin restart error happens for both small / large buckets constantly.