Presumably udp input threads are soaking up cpu

We have logstash running on ubuntu, logstash version 8.4.1 from the ubuntu repositories. We have a number of pipelines running, most of them work fine but the CPU utilization on the box is a bit high.

From the linux command line, I can see the java pid. Using top to isolate the threads in that java pid (top -Hp <java_pid_here>), I can see the following:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                      
2226708 logstash  39  19   32.0g  23.1g 446280 R  92.9  74.9  40:12.53 <udp.1                                                       
2226710 logstash  39  19   32.0g  23.1g 446280 R  92.3  74.9  40:10.88 <udp.2                                                       
2192930 logstash  39  19   32.0g  23.1g 446280 S  56.7  74.9  35:42.86 [REDAC]>worke                                              
2226704 logstash  39  19   32.0g  23.1g 446280 S  11.2  74.9   4:59.74 [REDAC]>wor                                              
3795182 logstash  39  19   32.0g  23.1g 446280 S   9.3  74.9  52:11.30 Ruby-0-Thread-4                                              
2226703 logstash  39  19   32.0g  23.1g 446280 S   8.3  74.9   4:57.91 [REDAC]>wor                                              
3785556 logstash  39  19   32.0g  23.1g 446280 S   7.1  74.9  36:17.43 [REDAC-SYSL     

The udp.1 and udp.2 threads are what I'm mostly interested in, but I cannot find any information on what these are, if they're attached to a pipeline, etc.

I'm going to guess these are udp input threads from pipelines? What specifically are these and how can I get additional information on them? This is a cli only box it's unlikely we're going to use VisualVM. I've tried to use jstack that's bundle with elastic but it doesn't give me much additional information (or perhaps I'm reading it wrong.....).

TL;DR: What are these threads under the logstash java thread and how can I get additional info on them to troubleshoot their high CPU usage?

By default, a udp input will have two processing threads. The < in the thread name tells you those are input threads. So ... you have a udp input somewhere. You need to look at the configuration. A udp input doesn't do much other than decode the packet and flush it the queue. It's possible that the codec on your input is failing expensively. Maybe not.

There are monitoring APIs that might help you, one is hot threads, another is pipeline stats.

A basic Java debugging technique would be to take a thread dump every 10 seconds for a couple of minutes and see if that shows where those two input threads are spending all their time.

1 Like

This is correct, we do have a pipeline via udp input. It appears to be only one pipeline so this makes sense for two processing threads.

this particular input uses the SFLOW codec so it's plausible it's something in there, although that input is fairly simple (just listing codec and udp port number).

By java thread dump do you mean like jstack or something? Sorry I'm not exactly well versed in debugging java stuff....

I really appreciate the response and this has been good information to see thus far.

jstack should be OK. It should let you see where what is at the top of the call stack most of the time, and that is where the time is being spent.

If this were Netflow I would suspect something like flow set mapping, but I don't think Sflow does that.

Yeah, not sure why my posts aren't showing up, but here's what this looks like:

"<udp.1" #7154 daemon prio=5 os_prio=0 cpu=2301084.71ms elapsed=2711.36s tid=0x00007fe44812a370 nid=0x272bed runnable  [0x00007fe4c39f9000]
   java.lang.Thread.State: RUNNABLE
        at org.jruby.RubyModule.searchWithCacheAndRefinements(RubyModule.java:1566)
        at org.jruby.RubyModule.searchWithCache(RubyModule.java:1535)
        at org.jruby.ir.targets.indy.InvokeSite.fail(InvokeSite.java:270)

The Logstash sFlow and Netflow codecs both use bindata which is especially slow. The codec also scales poorly across multiple cores/vCPUs. See the following numbers for for the Netflow codec (the sFlow codec is even worse)... Processing performance issue · Issue #85 · logstash-plugins/logstash-codec-netflow · GitHub

This is one of the primary reasons that the ElastiFlow solution moved on from Logstash and we created a brand new collector from scratch. Besides supporting A LOT more sFlow structures and protocols when decoding sFlow sample headers , the performance is more than 60x what is possible with Logstash on the same hardware.

If you have a requirement for high performance/throughput, and support for sFlow sampled traffic beyond simple ethernet and basic protocols, you will need to find a different solution than Logstash.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.