CPU use grows to 100%, Index rate to almost 0

Hey,

Configuration
Elastic Cluster (3 master nodes, 2 coordination nodes, 6 data nodes) + 2 Logstash (5.2.1) nodes
Elastic Cluster status is green:
Nodes: 11
Indices: 23
Memory: 57GB / 249GB
Total Shards: 150
Unassigned Shards: 0
Documents: 3,607,973,417
Data: 5TB
Uptime: 3 days
Version: 5.2.1

Logstash filter:

http://pastebin.com/KkZ51xGk

Logstash uses the default config.

Problem
Each of the Logstash nodes starting without any problem and importing with an index rate of ~20k events/sec . There are some Errors in the Logstash logs like:

Error parsing csv [ . . . ]  :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
Received an event that has a different character encoding than you configured. [ . . . ] :expected_charset=>"UTF-8"}

We are aware of them, but that are known problems with Bluecoat log files. 60k of 3,6 billions is "ok".

At some point the index rate drops to 300 events/sec and the cpu usage grows to 100% on all cpu cores. This happens on both Logstash Nodes, but independent from each other, e.g. on Logstash node 1 it occurs after 3 hour and on Logstash node 2 after 7 hour.

# ps -ef | grep java
root     21948 21041 99 Feb19 pts/2    12-07:39:39 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -Djava.awt.headless=true -Dfile.encoding=UTF-8 -XX:+HeapDumpOnOutOfMemoryError -Xmx1g -Xms256m -Xss2048k -Djffi.boot.library.path=/usr/share/logstash/vendor/jruby/lib/jni -Xbootclasspath/a:/usr/share/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/share/logstash/vendor/jruby -Djruby.lib=/usr/share/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main /usr/share/logstash/lib/bootstrap/environment.rb logstash/runner.rb --path.settings=/etc/logstash/

# top -Hp 21948 | head -n23
top - 11:11:25 up 27 days, 21:10,  1 user,  load average: 15.93, 15.56, 14.10
Threads:  84 total,   5 running,  79 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.2 us,  0.1 sy,  0.0 ni, 92.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65864420 total,  2103748 free,  1371972 used, 62388700 buff/cache
KiB Swap: 15615996 total, 15615996 free,        0 used. 63891584 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                                                    
22024 root      20   0 8244812 1.089g  18872 S 99.9  1.7   1039:01 [main]>worker11                                                                                                                                                            
22026 root      20   0 8244812 1.089g  18872 S 99.9  1.7   1038:29 [main]>worker13                                                                                                                                                            
22028 root      20   0 8244812 1.089g  18872 S 99.9  1.7   1039:25 [main]>worker15                                                                                                                                                            
22013 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1038:54 [main]>worker0                                                                                                                                                             
22014 root      20   0 8244812 1.089g  18872 R 93.8  1.7   1038:46 [main]>worker1                                                                                                                                                             
22015 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1039:30 [main]>worker2                                                                                                                                                             
22016 root      20   0 8244812 1.089g  18872 R 93.8  1.7   1039:05 [main]>worker3                                                                                                                                                             
22017 root      20   0 8244812 1.089g  18872 R 93.8  1.7   1039:03 [main]>worker4                                                                                                                                                             
22018 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1038:11 [main]>worker5                                                                                                                                                             
22019 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1038:59 [main]>worker6                                                                                                                                                             
22020 root      20   0 8244812 1.089g  18872 R 93.8  1.7   1038:29 [main]>worker7                                                                                                                                                             
22021 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1038:54 [main]>worker8                                                                                                                                                             
22022 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1039:03 [main]>worker9                                                                                                                                                             
22023 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1038:58 [main]>worker10                                                                                                                                                            
22025 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1038:42 [main]>worker12                                                                                                                                                            
22027 root      20   0 8244812 1.089g  18872 S 93.8  1.7   1039:16 [main]>worker14                                                                                                                                                            

Any ideas how i might find if there is problem within the filter or any other ideas? There are no errors/warnings in the ES or Logstash logs at the point the problem occurs.

Thanks
Andreas

Do you have the Monitoring plugin installed? If not, definitely start there.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.