We have logstash running on ubuntu, logstash version 8.4.1 from the ubuntu repositories. We have a number of pipelines running, most of them work fine but the CPU utilization on the box is a bit high.
From the linux command line, I can see the java pid. Using top to isolate the threads in that java pid (top -Hp <java_pid_here>), I can see the following:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2226708 logstash 39 19 32.0g 23.1g 446280 R 92.9 74.9 40:12.53 <udp.1
2226710 logstash 39 19 32.0g 23.1g 446280 R 92.3 74.9 40:10.88 <udp.2
2192930 logstash 39 19 32.0g 23.1g 446280 S 56.7 74.9 35:42.86 [REDAC]>worke
2226704 logstash 39 19 32.0g 23.1g 446280 S 11.2 74.9 4:59.74 [REDAC]>wor
3795182 logstash 39 19 32.0g 23.1g 446280 S 9.3 74.9 52:11.30 Ruby-0-Thread-4
2226703 logstash 39 19 32.0g 23.1g 446280 S 8.3 74.9 4:57.91 [REDAC]>wor
3785556 logstash 39 19 32.0g 23.1g 446280 S 7.1 74.9 36:17.43 [REDAC-SYSL
The udp.1 and udp.2 threads are what I'm mostly interested in, but I cannot find any information on what these are, if they're attached to a pipeline, etc.
I'm going to guess these are udp input threads from pipelines? What specifically are these and how can I get additional information on them? This is a cli only box it's unlikely we're going to use VisualVM. I've tried to use jstack that's bundle with elastic but it doesn't give me much additional information (or perhaps I'm reading it wrong.....).
TL;DR: What are these threads under the logstash java thread and how can I get additional info on them to troubleshoot their high CPU usage?
By default, a udp input will have two processing threads. The < in the thread name tells you those are input threads. So ... you have a udp input somewhere. You need to look at the configuration. A udp input doesn't do much other than decode the packet and flush it the queue. It's possible that the codec on your input is failing expensively. Maybe not.
A basic Java debugging technique would be to take a thread dump every 10 seconds for a couple of minutes and see if that shows where those two input threads are spending all their time.
This is correct, we do have a pipeline via udp input. It appears to be only one pipeline so this makes sense for two processing threads.
this particular input uses the SFLOW codec so it's plausible it's something in there, although that input is fairly simple (just listing codec and udp port number).
By java thread dump do you mean like jstack or something? Sorry I'm not exactly well versed in debugging java stuff....
I really appreciate the response and this has been good information to see thus far.
If you have a requirement for high performance/throughput, and support for sFlow sampled traffic beyond simple ethernet and basic protocols, you will need to find a different solution than Logstash.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.