Logstash throws java.lang.OutOfMemoryError: Java heap space no matter the heap size

ste1 · April 16, 2023, 6:00pm

Im attempting to parse a huge (few million lines) csv file with logstash and output it to elasticsearch.

[FATAL] 2023-04-16 19:00:19.011 [LogStash::Runner] Logstash -
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapCharBuffer.<init>(java/nio/HeapCharBuffer.java:61) ~[?:?]
        at java.nio.CharBuffer.allocate(java/nio/CharBuffer.java:348) ~[?:?]
        at java.nio.charset.CharsetDecoder.decode(java/nio/charset/CharsetDecoder.java:807) ~[?:?]
        at java.nio.charset.Charset.decode(java/nio/charset/Charset.java:814) ~[?:?]
        at org.jruby.RubyEncoding.decodeUTF8(org/jruby/RubyEncoding.java:297) ~[jruby-complete-9.2.20.1.jar:?]
        at org.jruby.RubyString.decodeString(org/jruby/RubyString.java:802) ~[jruby-complete-9.2.20.1.jar:?]
        at org.jruby.RubyString.toString(org/jruby/RubyString.java:793) ~[jruby-complete-9.2.20.1.jar:?]
        at org.logstash.Javafier.lambda$initConverters$1(org/logstash/Javafier.java:88) ~[logstash-core.jar:?]
        at org.logstash.Javafier$$Lambda$586/0x00000001012b4c40.convert(org/logstash/Javafier$$Lambda$586/0x00000001012b4c40) ~[?:?]
        at org.logstash.Javafier.deep(org/logstash/Javafier.java:57) ~[logstash-core.jar:?]
        at org.logstash.Event.getField(org/logstash/Event.java:177) ~[logstash-core.jar:?]
        at org.logstash.StringInterpolation.evaluate(org/logstash/StringInterpolation.java:86) ~[logstash-core.jar:?]
        at org.logstash.Event.sprintf(org/logstash/Event.java:363) ~[logstash-core.jar:?]
        at org.logstash.ext.JrubyEventExtLibrary$RubyEvent.sprintf(org/logstash/ext/JrubyEventExtLibrary.java:202) ~[logstash-core.jar:?]
        at java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java/lang/invoke/DirectMethodHandle$Holder) ~[?:?]
        at java.lang.invoke.LambdaForm$MH/0x0000000100780840.invoke(java/lang/invoke/LambdaForm$MH) ~[?:?]
        at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(java/lang/invoke/DelegatingMethodHandle$Holder) ~[?:?]
        at java.lang.invoke.LambdaForm$MH/0x0000000100737c40.guard(java/lang/invoke/LambdaForm$MH) ~[?:?]
        at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(java/lang/invoke/DelegatingMethodHandle$Holder) ~[?:?]
        at java.lang.invoke.LambdaForm$MH/0x0000000100737c40.guard(java/lang/invoke/LambdaForm$MH) ~[?:?]
        at java.lang.invoke.Invokers$Holder.linkToCallSite(java/lang/invoke/Invokers$Holder) ~[?:?]
        at usr.share.logstash.logstash_minus_core.lib.logstash.util.decorators.add_fields(/usr/share/logstash/logstash-core/lib/logstash/util/decorators.rb:34) ~[?:?]
        at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java/lang/invoke/DirectMethodHandle$Holder) ~[?:?]
        at java.lang.invoke.LambdaForm$MH/0x00000001012b7840.invoke(java/lang/invoke/LambdaForm$MH) ~[?:?]
        at java.lang.invoke.Invokers$Holder.invokeExact_MT(java/lang/invoke/Invokers$Holder) ~[?:?]
        at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1821) ~[jruby-complete-9.2.20.1.jar:?]
        at java.lang.invoke.LambdaForm$DMH/0x0000000100763040.invokeVirtual(java/lang/invoke/LambdaForm$DMH) ~[?:?]
        at java.lang.invoke.LambdaForm$MH/0x0000000100780840.invoke(java/lang/invoke/LambdaForm$MH) ~[?:?]
{

is thrown after a few minutes of running logstash.
There are similar threads asking about this very question but none are properly answered.

Inside my /etc/logstash/jvm.options:
-Xms2g
-Xmx2g

my machine has 8gb of ram. I have tried lowering the heap memory, I have tried making it higher but nothing helps. If I put it too high after a while logstash will just crash with "Killed".

/etc/elasticsearch/jvm.options.d/jvmheap.options has:
-Xms2g
-Xmx2g

I dont know what to do as I really cannot afford purchasing a server with more RAM but need to parse this file no matter what.

Badger · April 16, 2023, 6:35pm

What does your configuration look like? Inputs, filters...

ste1 · April 16, 2023, 6:38pm

input {
    file {
        path => "path/to/file.csv"
        start_position => "beginning"
    }
}
filter {
  csv {
    autodetect_column_names => false
    columns => ["username", "uid"]
    target => "_tmp"
  }
  mutate {
    add_field => {
      "[data][username]" => "%{[_tmp][username]}"
      "[data][uid]" => "%{[_tmp][uid]}"
    }
  }
  mutate {
    remove_field => ["_tmp"]
  }
  prune {
    whitelist_names => [ "data" ]
  }
}
output {
       elasticsearch {
        hosts => ["http://localhost:9200/"]
        index => "uids"
    }
    stdout{}
}

Badger · April 16, 2023, 7:18pm

I would not expect that configuration to need more than 300 MB to run in! You can simplify the filters a little

csv { columns => ["username", "uid"] target => "data" autogenerate_column_names => false }

will only parse the first two columns of the CSV file, so you can remove the mutate filters and even the prune. The OOM is happening in the add_field, although I doubt removing it will change much.

Reducing pipeline.batch.size from the default of 125 might help.

ste1 · April 16, 2023, 7:30pm

There are multiple columns in the CSV file and im extracting only the ones I need, unfortunately getting the first two columns would not work

Edit: Decreasing pipeline.batch.size did not help :^(

Badger · April 16, 2023, 8:05pm

But that is what your filter configuration would do if it didn't run out of memory! The first two columns will be named username and uid, the rest will be column3, column4, etc, and will get remove when you delete [_tmp]

Rios · April 17, 2023, 10:57am

How about pipeline.workers to 1? Will be slower, however will use less resources.
Also, maybe file_chunk_count, default value is 4611686018427387903, to reduce reading all at once.

system · May 15, 2023, 10:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[LogStash::Runner] Logstash - java.lang.OutOfMemoryError: Java heap space Logstash	2	780	November 27, 2019
Logstash - java.lang.OutOfMemoryError: Java heap space Logstash	1	559	May 4, 2020
Java Heap Space error OutOfMemoryError Logstash Logstash	7	13330	October 16, 2019
Java heap space issue with logstash Logstash	2	509	August 30, 2020
Java heapspace issue while running logstash Logstash	5	2018	June 24, 2020

Logstash throws java.lang.OutOfMemoryError: Java heap space no matter the heap size

Related topics