Logstash input S3 module problem

The problem is in the operation of the S3 module, the module starts for some time, everything works, but after a couple of hours the module is left with an error:

Error: Too many open files - Too many open files
May 25 06:06:07 logstash02.test logstash[143470]:   Exception: Errno::EMFILE
May 25 06:06:07 logstash02.test logstash[143470]:   Stack: org/jruby/RubyIO.java:1234:in `sysopen'
May 25 06:06:07 logstash02.test logstash[143470]: org/jruby/RubyIO.java:3774:in `read'

and no new data is received or processed. But this has a bad effect on the work of logstash.

I did not find a parameter in the documentation to limit the number of files or the time during which files are closed

Are there any solutions to this problem, otherwise I have run out of ideas?

This is more an OS issue than a logstash one, the simplest solution is to increase the number of files that Logstash process can open.

Per default Logstash have this in the systemd unit file.

LimitNOFILE=16384

You can increase the number to see if it helps.

Do your s3 bucket has a lot of files? Are the files small or large? The s3 input basically download the files, creates a temporarily file while it is processing and then remove the temporarily file, if you have a lot of files in the bucket this could result in a lot of temporarily files, and depending on those files, this could lead to a lot of open file descriptors.

You may try to change the interval, reduce it from 60 to 30 for example, so it will look at the bucket in smaller intervals and download less files each time.

Thanks for the recommendation, I'll check it out and report back as it's my last hope for it to work.

hi. I changed the setting
"interval" => "60"
but the problem is still the same:

Jun 29 06:00:40 logstash02.test logstash[67036]: [2023-06-29T06:00:40,325][ERROR][logstash.javapipeline    ][aws-tenant-logs-pipe][41941adc0390a2f8cd13a9ff8c9732018535bd35dff6d9fd8948687aa956185a] A plugin had an unrecoverable error. Will restart this plugin.
Jun 29 06:00:40 logstash02.test logstash[67036]:   Pipeline_id:aws-tenant-logs-pipe
Jun 29 06:00:40 logstash02.test logstash[67036]:   Plugin: <LogStash::Inputs::S3 access_key_id=>"xxxx", bucket=>"aws-eu-central-1", gzip_pattern=>"\\.gz", additional_settings=>{"ssl_verify_peer"=>"false", "force_path_style"=>"true", "follow_redirects"=>"false", "http_wire_trace"=>"true"}, prefix=>"/AWSLogs/", secret_access_key=><password>, exclude_pattern=>"_CloudTrail-Digest_|_202302|_2023030|\\/2023\\/02\\/|_202303|\\/2023\\/03\\/|\\/2023\\/04\\/|\\/2023\\/05\\/|\\/2023\\/06/1|\\/2023\\/06\\/0\\/2023\\/06\\/20|\\/2023\\/06\\/21|\\/2023\\/06\\/23|\\/2023\\/06\\/22|\\/2023\\/06\\/24|\\/2023\\/06\\/25|\\/2023\\/06\\/26|\\/2023\\/06\\/27", interval=>60, id=>"41941adc0390a2f8cd13a9ff8c9732018535bd35dff6d9fd8948687aa956185a", region=>"eu-central-1", sincedb_path=>"/opt/s3bucket/sincedb", proxy_uri=>"http://proxy:8080", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_98767b8b-1b9a-48f5-bfde-331a8895a322", enable_metric=>true, charset=>"UTF-8">, role_session_name=>"logstash", delete=>false, watch_for_new_files=>true, temporary_directory=>"/tmp/logstash", include_object_properties=>false>
Jun 29 06:00:40 logstash02.test logstash[67036]:   Error: Too many open files - Too many open files
Jun 29 06:00:40 logstash02.test logstash[67036]:   Exception: Errno::EMFILE
Jun 29 06:00:40 logstash02.test logstash[67036]:   Stack: org/jruby/RubyIO.java:1234:in `sysopen'
Jun 29 06:00:40 logstash02.test logstash[67036]: org/jruby/RubyIO.java:3774:in `read'
Jun 29 06:00:40 logstash02.test logstash[67036]: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-s3-3.8.4/lib/logstash/inputs/s3.rb:455:in `read'
Jun 29 06:00:40 logstash02.test logstash[67036]: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-s3-3.8.4/lib/logstash/inputs/s3.rb:142:in `list_new_files'
Jun 29 06:00:40 logstash02.test logstash[67036]: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-s3-3.8.4/lib/logstash/inputs/s3.rb:186:in `process_files'
Jun 29 06:00:40 logstash02.test logstash[67036]: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-s3-3.8.4/lib/logstash/inputs/s3.rb:133:in `block in run'
Jun 29 06:00:40 logstash02.test logstash[67036]: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:20:in `interval'
Jun 29 06:00:40 logstash02.test logstash[67036]: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-s3-3.8.4/lib/logstash/inputs/s3.rb:132:in `run'
Jun 29 06:00:40 logstash02.test logstash[67036]: /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:410:in `inputworker'
Jun 29 06:00:40 logstash02.test logstash[67036]: /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:401:in `block in start_input'

Perhaps this is a problem with the module and it does not work correctly and you need to make some changes in it?

Did you tried to increase the number of the files that Logstash can open? You didn't answer none of the questions asked in the previous post:

Do your s3 bucket has a lot of files? Are the files small or large?

So it is pretty hard to suggest anything.

I do not work for Elastic, if you think this is a bug you should open an issue on Github and Elastic may or may not look on it.

I apologize for not responding right away.

  • Yes, I tried to increase the number of opened files.
# End of file
*       soft    nofile  24800
*       hard    nofile  65535
*       soft    stack   4096
*       hard    stack   4096
*       soft    nproc   24576
*       hard    nproc   24576

and

# su logstash -s /bin/bash -c 'ulimit  -a'
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127914
max locked memory           (kbytes, -l) 4107844
max memory size             (kbytes, -m) unlimited
open files                          (-n) 24800
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 4096
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 24576
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

Are there many files in your s3 bucket? Are the files small or large?

  • There are a lot of folders and small files in S3. Some of the unnecessary ones I'm trying to eliminate with a pattern.
 "exclude_pattern" => "_CloudTrail-Digest_|_202302|_2023030|\/2023\/02\/|_202303|\/2023\/03\/|\/2023\/04\/|\/2023\/05\/|\/2023\/06/1|\/2023\/06\/0\/2023\/06\/20|\/2023\/06\/21|\/2023\/06\/23|\/2023\/06\/22|\/2023\/06\/24|\/2023\/06\/25|\/2023\/06\/26|\/2023\/06\/27"

I'm just a novice in this business, please tell me where to open this case, so that they can consider and tell what the cause of the problem is.

[Unit]
Description=logstash

[Service]
Type=simple
User=logstash
Group=logstash
# Load env vars from /etc/default/ and /etc/sysconfig/ if they exist.
# Prefixing the path with '-' makes it try to load, but if the file doesn't
# exist, it continues onward.
EnvironmentFile=-/etc/default/logstash
EnvironmentFile=-/etc/sysconfig/logstash
ExecStart=/usr/share/logstash/bin/logstash "--path.settings" "/etc/logstash"
Restart=always
WorkingDirectory=/
Nice=19
LimitNOFILE=1048576
#LimitNOFILE=16384


# When stopping, how long to wait before giving up and sending SIGKILL?
# Keep in mind that SIGKILL on a process can cause data loss.
TimeoutStopSec=infinity

[Install]
WantedBy=multi-user.target

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.