Logstash crashing

sc5283 · September 13, 2023, 3:13pm

Input is from S3
layout of S3 bucket is :

s3    {
       ....
         bucket => "bucket"
         prefix => "YYYY/MM/DD/hh/"
       .....
}

so every hour I have to create a new conf file with the corresponding prefix so it gets process accordingly

Issue:
Logstash crashes. (assuming too many conf files)

Questions:
Is there a way to get a single file per day by changing the conf file? instead of creating multiple files (one per hour)
if I change the prefix to only include the year/month/day it does not download anything.

How do I overcome the crashing of logstash
Here is the partial log:

[2023-09-13T14:54:50,913][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, 
"pipeline.sources"=>["/etc/logstash/conf.d/s3_nr_2023091121.conf", "/etc/logstash/conf.d/s3_nr_2023091200.conf", "/etc/logstash/conf.d/s3_nr_2023091300.conf", 
"/etc/logstash/conf.d/s3_nr_2023091301.conf", "/etc/logstash/conf.d/s3_nr_2023091302.conf", 
"/etc/logstash/conf.d/s3_nr_2023091321.conf", "/etc/logstash/conf.d/s3_nr_2023091322.conf", 
"/etc/logstash/conf.d/s3_nr_2023091323.conf", "/etc/logstash/conf.d/s3_nr_2023091400.conf", 
"/etc/logstash/conf.d/s3_nr_2023091401.conf", "/etc/logstash/conf.d/s3_nr_2023091402.conf", 
"/etc/logstash/conf.d/s3_nr_2023091403.conf", "/etc/logstash/conf.d/s3_nr_2023091404.conf", 
"/etc/logstash/conf.d/s3_nr_2023091405.conf", "/etc/logstash/conf.d/s3_nr_2023091406.conf", 
"/etc/logstash/conf.d/s3_nr_2023091407.conf", "/etc/logstash/conf.d/s3_nr_2023091408.conf", 
"/etc/logstash/conf.d/s3_nr_2023091409.conf", "/etc/logstash/conf.d/s3_nr_2023091410.conf", 
"/etc/logstash/conf.d/s3_nr_2023091411.conf", "/etc/logstash/conf.d/s3_nr_2023091412.conf", 
"/etc/logstash/conf.d/s3_nr_2023091413.conf", "/etc/logstash/conf.d/s3_nr_2023091414.conf"], :thread=>"#<Thread:0x728c9767@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}

[2023-09-13T14:54:50,935][FATAL][org.logstash.Logstash    ][main] uncaught error (in thread Ruby-0-Thread-58: /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:289)
java.lang.StackOverflowError: null
	at java.util.Spliterators$IteratorSpliterator.estimateSize(java/util/Spliterators.java:1865) ~[?:?]
	at java.util.Spliterator.getExactSizeIfKnown(java/util/Spliterator.java:414) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(java/util/stream/AbstractPipeline.java:508) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(java/util/stream/AbstractPipeline.java:499) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(java/util/stream/ReduceOps.java:921) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(java/util/stream/AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(java/util/stream/ReferencePipeline.java:682) ~[?:?]
	at org.logstash.config.ir.CompiledPipeline$CompiledExecution.compileDependencies(org/logstash/config/ir/CompiledPipeline.java:560) ~[logstash-core.jar:?]
	at org.logstash.config.ir.CompiledPipeline$CompiledExecution.flatten(org/logstash/config/ir/CompiledPipeline.java:514) ~[logstash-core.jar:?]
	at org.logstash.config.ir.CompiledPipeline$CompiledExecution.filterDataset(org/logstash/config/ir/CompiledPipeline.java:435) ~[logstash-core.jar:?]
	at org.logstash.config.ir.CompiledPipeline$CompiledExecution.lambda$compileDependencies$6(org/logstash/config/ir/CompiledPipeline.java:537) ~[logstash-core.jar:?]
	at java.util.stream.ReferencePipeline$3$1.accept(java/util/stream/ReferencePipeline.java:197) ~[?:?]

leandrojmp · September 13, 2023, 3:50pm

Unfortunately no, the prefix option is not dynamic.

This is weird, it should work with any prefix, so there is some issue here. Do you have anything in the logs?

Also, do you have a lot of files? The perform of the s3 input on buckets with a lot of files is pretty bad, is basically unusable in some cases.

If you have a lot of files per hour and are using the day prefix, it may take a while to list everything and start downloading.

sc5283 · September 13, 2023, 4:23pm

yes, we have 1000s of files every hour
and also we keep about 4 months of it, so I need to use the prefix otherwise it will never finish ingesting the older data. I just need to ingest the most recent.

Do you know if the issue with logstash crashing is because of the number of config files?
maybe too many for the amount of processors?

leandrojmp · September 13, 2023, 5:33pm

I'm not sure, but it could be since every config file will start an input, filters and outputs.

If I'm not wrong the error java.lang.StackOverflowError is related to exhaustion of resources of the application.

Do you need to have all those inputs running? For example, are you still adding data to the bucket for already passed hours?

sc5283 · September 13, 2023, 5:49pm

I can delete them once they are done and only keep what is needed
but the issue is that I cannot tell when each completes, then I can only keep ie. 4/5 at a time

unless there is a way to know when it finishes parsing all files in a bucket.

system · October 11, 2023, 5:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.