Hello, we are using logstash to collect sflow. We use fluentd at the ingest point, which forwards flows to logstash to do some processing and push to the elastic cloud. These both reside on the same box and largely work great together (fluentd/logstash).
We noticed on one of our higher volume flow boxes that logstash appears to be leaking memory, with items piling up within logstash and slowly consuming the entire heap.
We were first able to see this within VisualVM, with the heap size slowly growing over a few days to the point of it causing OOM errors. We added more memory to jvm.options and to the host but saw the same behavior.
This led us to pull a heap dump and explore it within Eclipse MAT, as the elastic documentation suggests. I'm by no means an expert in either java or it's heap dumps, but I believe we have found something interesting and I'm not positive on how to interpret this or how to fix it, I was hoping somebody on this forum could assist.
This is our input segment which we believe is causing this:
input {
tcp {
host => "127.0.0.1"
port => "8888"
ecs_compatibility => "disabled"
codec => "fluent"
}
}
We see the same result if using the fluent codec or not, fwiw.
We see a LARGE number of items being stored which are "errorinfo: org.jrubyStandarderror" and "Decode_buffer bytes not available":
The truncated items are:
usr.share.logstash.vendor.bundle.jruby.$3_dot_1_dot_0.gems.logstash_minus_input_minus_tcp_minus_6_dot_4_dot_1_minus_java.lib.logstash.inputs.tcp.decoder_impl.decode() (/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-input-tcp-6.4.1-java/lib/logstash/inputs/tcp/decoder_impl.rb:23)
usr.share.logstash.vendor.bundle.jruby.$3_dot_1_dot_0.gems.logstash_minus_input_minus_tcp_minus_6_dot_4_dot_1_minus_java.lib.logstash.inputs.tcp.decode_buffer() (/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-input-tcp-6.4.1-java/lib/logstash/inputs/tcp.rb:219)
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-input-tcp-6.4.1-java/lib/logstash/inputs/tcp/decoder_impl.rb:23
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-input-tcp-6.4.1-java/lib/logstash/inputs/tcp.rb:219
This is the decode buffer code segment referenced in tcp.rb:
217 def decode_buffer(client_ip_address, client_address, client_port, codec, proxy_address,
218 proxy_port, tbuf, ssl_subject)
219 codec.decode(tbuf) do |event|
220 if @proxy_protocol
221 event.set(@field_proxy_host, proxy_address) unless event.get(@field_proxy_host)
222 event.set(@field_proxy_port, proxy_port) unless event.get(@field_proxy_port)
223 end
224 enqueue_decorated(event, client_ip_address, client_address, client_port, ssl_subject)
225 end
226 end
I guess At this point it's not clear to me what specifically is causing these errors, or how to correct them.
I have many heap dumps and would be happy to provide more info, like I said I'm not an expert in java or heap dumps so if you would like more info please be explicit in how I can provide that.
It feels like these errors are being logged to the heap. In general the entire flow for our process works fine and the data/numbers we're seeing in elastic appears to be correct and working. We just have these heap issues and I don't know why.
And before the elastiflow bro chimes in here, we are not interested in using elastiflow.
Thank you for any assistance you can provide.