Another High CPU usage with Logstash

mhare · October 9, 2019, 1:29pm

1st: fairly new to ElasticStack configuration

I've looked at the other topics on High CPU, but I do not believe they are helping me

I have an Ubuntu box with 8 cores and 8 G of Ram
This machine is running Logstash, Elastic Search and Kibana (and Grafana)
Logstash is indexing 8 pipelines, each ingesting a separate CSV file
Yesterday I changed the writers to the CSVs from updating every 5 min to update every 15 min
Did not seem to make any difference.
I have sincedb_path set to /dev/null and start_position is set to 'beginning'

My 1 m and 5 m load averages are at 93%
Top shows:
top - 13:17:33 up 21:49, 1 user, load average: 7.36, 7.45, 7.59
Tasks: 217 total, 1 running, 117 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.1 sy, 0.5 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8144444 total, 1681432 free, 4149572 used, 2313440 buff/cache
KiB Swap: 4194300 total, 4192508 free, 1792 used. 3700760 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3835 logstash 39 19 7287796 1.596g 27932 S 4.7 20.5 59:19.61 java
1006 elastic+ 20 0 9243596 1.936g 295228 S 2.1 24.9 63:21.69 java
6059 elkadmin 20 0 42924 4080 3340 R 0.5 0.1 0:00.04 top

and Hot Threads is showing:
::: {nj6v-elasticsearch}{OdDLDn1FT8au-x541oLPDg}{k2YYnOUKRt28TqFTrZj3GA}{localhost}{127.0.0.1:9300}{dim}{ml.machine_memory=8339910656, xpack.installed=true, ml.max_open_jobs=20}
Hot threads at 2019-10-09T13:22:29.007Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

Which I think means no hot threads? Or I executed the command wrong (GET /_nodes/hot_threads in the Kibana Console)

I had started with 4 Cores and when the CPU went high I added 4 more and it just filled up again

I did read something about the reload changing from sec to mSec. Is there a change I need to make to a config file somewhere

Anyone have an idea what's going on here?

Thanks!

Mike

rugenl · October 9, 2019, 2:55pm

Are you looking at load as a CPU indicator? Your other stats show CPU resources 99.2% idle, which is good.

I usually see logstash running "nice", if that is your case, it might be the .5% CPU used there.

How many pipeline workers do you have? I think the fault is the number of cores. I'd reduce that to at least 4 and see if it changes things. For low use pipelines, 1 or 2 workers are probably plenty, depending on the complexity of the filters.

mhare · October 9, 2019, 4:06pm

I am looking at 'top: load average' of 7.4 7.3 and 7.3, which with 8 cores is 90+%
I do not understand the line about logstash running 'nice' can you clarify?
I set my workers to 4 and now my load average is over 10
top - 15:59:06 up 1 day, 31 min, 2 users, load average: 10.54, 8.67, 7.86
Tasks: 228 total, 1 running, 124 sleeping, 0 stopped, 0 zombie
I don't know what constitutes a complex filter, but I would guess mine are all pretty tame, they just parse CSV

Thanks for helping!

Badger · October 9, 2019, 4:29pm

You do not have high CPU usage. Your CPU is 99.2% idle.

You have high load averages, which is not the same thing. (This provides some interesting history around this.)

A process contributes to load when it is running on a CPU, but it can also be counted as part of the load when it is waiting in state TASK_UNINTERRUPTIBLE. That can include waiting on disk IO or mutexes (and many other things).

How confident are you that logstash is causing the high load average? Does it drop when logstash is stopped? If it is logstash then I would start looking at stack traces for each thread in the JVM and try to figure out what it could be waiting on.

mhare · October 9, 2019, 6:57pm

Interesting, I was just looking at the log and noticed a bunch of errors reading from a Windows drive share (the CSVs are all created on a Windows box from a powershell script.) I ingest the CSV files across a Windows<->Linux file share. Maybe this isn't a best practice.
I had considered running logstash on the Windows box and let ES pull across the network to the Linux box but dismissed it. Don't even remember why, probably not a good reason.

Thank you!

mhare · October 9, 2019, 7:03pm

Just to continue..
I stopped the logstash service and top now shows
top - 19:01:56 up 1:10, 2 users, load average: 0.41, 5.02, 7.99
Tasks: 228 total, 1 running, 124 sleeping, 0 stopped, 0 zombie

rugenl · October 9, 2019, 7:29pm

Nice has to do with priority, the NI column of top. Default is 0, logstash runs as 19 by default, the lowest priority, see the "nice" command man page.

I guess load is OK when you stop logstash?

Another tweak is to set the config.reload.interval to 30 or 60 seconds, default is 3.

Len Rugen

rugenl at yahoo.com

mhare · October 9, 2019, 7:38pm

I looked at that earlier, but I never turned config.reload.automatic to true, it is commented out of the logstash.yml file, so that means it's off , right? (defaults to false) So changing the interval to 30s won't have any affect?

rugenl · October 9, 2019, 8:18pm

Correct.

I just looked at one of our main servers, 32 cores, runs elastic hot data node and logstash, load is in the teens. %CPU is 15% user, 2% sys, 9% nice, 75% idle. Logstash %CPU is 325, so 3/32 is roughly the 9% nice.

This logstash is ingesting about 3000 events/sec in 9 pipelines.

Len Rugen

rugenl at yahoo.com

Badger · October 9, 2019, 8:24pm

Can you post a stack trace for the JVM somewhere?

mhare · October 9, 2019, 9:01pm

Can you try this?
Logstash JVM Stack

Badger · October 9, 2019, 9:57pm

There are nine threads that are in Java thread state RUNNABLE and which look like this

"[taskstat-dsp-860-pipeline]<file" #115 daemon prio=5 os_prio=0 cpu=820.16ms elapsed=5783.45s tid=0x00007fd948030800 nid=0xe3a runnable  [0x00007fd9434f1000]
   java.lang.Thread.State: RUNNABLE
at com.kenai.jffi.Foreign.invokeN3O1(Native Method)
at com.kenai.jffi.Invoker.invokeN3(Invoker.java:1061)
at jnr.enxio.channels.Native$LibC$jnr$ffi$2.read(Unknown Source)
at jnr.enxio.channels.Native.read(Native.java:101)
at jnr.enxio.channels.NativeDeviceChannel.read(NativeDeviceChannel.java:76)
at org.jruby.util.io.PosixShim.read(PosixShim.java:158)
at org.jruby.util.io.OpenFile$2.run(OpenFile.java:1327)
at org.jruby.util.io.OpenFile$2.run(OpenFile.java:1317)
at org.jruby.RubyThread.executeTask(RubyThread.java:1580)
at org.jruby.util.io.OpenFile.readInternal(OpenFile.java:1390)
at org.jruby.RubyIO.sysread(RubyIO.java:3011)
at java.lang.invoke.LambdaForm$DMH/0x0000000840698440.invokeVirtual(java.base@11.0.4/LambdaForm$DMH)
[...]

I believe this is sitting on a read in a file input. It's not actually doing anything (just waiting for data to become available) but it whatever it is doing inside that native call results in it being included in the load average. Basically your load average is misleading.

If you really want to be certain then cat /proc/<pid>/task/<taskid>/stack to get the rest of the stack trace in the native code and verify in the Linux source that whatever it is doing sets the process state to TASK_UNINTERRUPTIBLE.

I think this is a non-issue.

mhare · October 9, 2019, 11:41pm

Thank you so much for your help!

Michael Hare

Sr. Manager, Firmware Engineering
Phone: 800-237-2041
Ext: 16568
Email: mhare@crestron.com
Crestron Plano
7250 Dallas Parkway, Suite 600, Plano, Texas 75024

rcowart · October 10, 2019, 12:39pm

I would concur with @Badger. The network latency inherit with that file share is what is sure to be the cause of the load as threads block waiting for network I/O.

My recommendation is to run Filebeat on the Windows box to read the CSV files, and use Filebeat's Logstash output to send the data to Logstash where you can do any additional parsing/formatting/etc.

This is the technique I use in this project...

The needed filebeat.yml config is as simple as...

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /path/to/your/*.csv

output.logstash:
  hosts: ["192.2.0.11:5044"]

And the Logstash input is...

input {
  beats {
    host => "0.0.0.0"
    port => "5044"
    client_inactivity_timeout => 180
  }
}

Then rework the rest of the pipeline to work with the data from Filebeat. At most you should only have to tweak a few field names.

mhare · October 10, 2019, 2:20pm

Thank you, I am reading about Filebeat now.
I am noticing that I am not getting the read error at the current time, and the Load Averages are not peaking, so there is another data point indicating the file share reads are contributing to the system load.

mhare · October 11, 2019, 3:46pm

Trying to figure out how to configure Filebeat.
I currently process 8 separate pipelines with logstash.
Do I need to configure a separate Filebeat.inputs and Logstash input->beats for each of the 8 pipelines, using different port numbers to differentiate between them?
How do I associate a specific input with a specific port number?

mhare · October 29, 2019, 1:15pm

To follow up, the load averages were indeed tied to reading the CSVs across the file share.
I installed Filebeat on the windows box and ship the CSVs to logstash and my load average went from 95 ~ 110% to 10%.
Thanks for all the help!

system · November 26, 2019, 1:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Single thread with high CPU usage Elasticsearch	3	2611	July 6, 2017
Elasticsearch CPU usage High Elasticsearch	5	2386	March 9, 2021
High CPU Utilized by JAVA in single node ELK #31089 Elasticsearch	12	2623	July 19, 2018
Logstash 7.5.1 High CPU Usage Logstash	7	1929	February 6, 2020
Logstash high cpu usage when idle Logstash	2	2724	January 10, 2018

Another High CPU usage with Logstash

Related topics