Using s3 input plugin logstash file_descriptors aren't closed leading to crash of system

Been trying to figure out our problem for the past week and haven't figured out a solution other than restart logstash multiple times when it runs out of resources. Looking for any help or suggestions on what we can try, as of right now i'm doing ingests in chunks and

Problem in a nutshell

The s3 plugin appears to continue to open file descriptors until it reaches the limit of open file descriptors at which point logstash becomes totally unresponsive and effectively is hung. If the file limits count is set to something small like 4, 8 or 16K files that's what happens, when I set it to a higher count (e.g. 32K or 64K files) it appears that it reaches a certain limit (around 25K file descriptors open and then completely chokes from lack of memory where the jvm heap is at roughly 95%+ of the 8GB allocated. Either way, it seems the root problem is from too many open file descriptors.

Related threads

There are a handful of topics, issues, describing similar problems, none are smoking guns and most seem unresolved

Logstash crashing with exception IOError: Too many open files · Issue #4815 · elastic/logstash · GitHub (OPEN since 2018)
File descriptors are leaked when using HTTP · Issue #1604 · elastic/logstash · GitHub (CLOSED)
elasticsearch - Logstash close file descriptors? - Stack Overflow

Configuration & Pipeline

Been mainly running this in a docker container with Xmx and Xms set to 8GB. Have tried running logstash directly on our linux machine (no docker) and the results are the same.

Here are the config files, pretty simple overall

Pipeline in a nutshell, we have several pipelines reading data from s3. Each s3 input points to a separate bucket that has multiple subfolders (prefixes), each subfolder has ~200 files/day and we're trying at the moment to ingest about 90 days worth of data so about 18K files per pipeline.
logstash.yml - nothing special configured, basically using defaults
pipelines.yml - example of pipeline

  • pipeline.id: sample
    path.config: "sample-pipeline.conf"
    queue.type: persisted

sample-pipeline.conf

input { s3 {} }
filter { mutate, kv, grok, drop, fingerprint }
output { elasticsearch {} }

Attempted Fixes

I've played with the open file limits, allocated JVM RAM, and other configuration options, but at the end of the day I can't figure out how to get file descriptors to close which leads to either reaching the limit or heap to max out and GC takes over and dominates all free resources. I've attempted a few things on the input side, like setting option delete option so after the file is read from s3 it is deleted. Still, no luck and so we're stuck at the solution of just running batches of files, shutdown logstash and restart.

Was thinking about posting this as a bug on github but figured I'd attempt getting support here first. our workaround of restarting logstash a few times a day is working for now but not ideal and definitely feels like this should be a bug report. Not sure if it's an S3 plugin issue or a wider issue

ok, so pretty certain this is an issue and looking to open an issue on github related to this thread. It continues to happen to us with our dataset, have bumped the JVM heap for logstash to 16GB and that allows us to push a little more data but eventually things still crap out with too many open file descriptors.

  • With 8GB heap in logstash things run well up to about 25K open file descriptors at which point JVM heap is at 99% and GC consumes all the CPU effectively shutting our pipeline down.
  • At 16GB heap we get up to 50K open file descriptors before the same issue happens.

Am curious if anyone can explain the 8GB recommendation for JVM Heap from elastic, why not allocate more? Why is the recommendation 4-8GB?

Increasing the heap size comes with costs. Not just the memory usage, but an increase in the number of GC roots that remain in memory, which drives up the cost of running GC.

Depending on the pattern of memory usage, you may see an increase in GC costs as heap allocation increases (not as a percentage, but as an absolute value). In some case those increases can be significant.

@Badger - thanks, appreciate the feedback. I've been reading up and found this article very useful https://www.elastic.co/blog/a-heap-of-trouble. I'm starting to understand the inner workings of elastic and java applications a bit more.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.