Logstash Stops Processing


(Kotter) #1

Hello all:

I'm working with a logstash docker image (https://registry.hub.docker.com/u/helder/logstash/) linked to elasticsearch and kibana, all via docker.

My problem is that logstash stops processing randomly. I only have 2 window hosts sending messages back via nxlog on a narrow subset of application text logs.

My questions are as follows, any help would be greatly appreciated:

  1. where does logstash log it's own events? I cannot find any logfiles related to logstash in the container via /var/log, and I do not see any *.log files in /opt/logstash

  2. would a singular message cause logstash to stop processing all-together?

My logstash.conf file is as follows:

input {
  tcp {
    type => "iseapps"  
    codec => json
    port => 5514
  }
}

filter { 
  if [type] == "iseapps" {
    grok {
      break_on_match => false
      match => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}SocketEngine WriteSock%{SPACE}%{GREEDYDATA:infomsg}"]
      match => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}GTS_ORA start processing%{SPACE}%{GREEDYDATA:infomsg}"]
match => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTYDATA:infomsg}"]}\)%{SPACE}\(\)%{SPACE}GTS_ORA -----> POB%{SPACE}%{GREED--More--(12%)
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}GTS_ORA processing%{SPACE}%{GREEDYDATA:infomsg}"]
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}CustomRule Tag%{GREEDYDATA:infomsg}"]
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}GTS_ORA%{GREEDYDATA:infomsg}, gtsSeqNum:%{SPACE}%{NOTSPACE:gtsSeqNum} fixSeqNum: %{NOTSPACE:fixSeqNum} msgType: %{NOTSPACE:msgType} requestId: %{NOTSPACE:requestId} dtiSendingTime: %{NOTSPACE:dtiSendingTime} %{SPACE}  GWSession \[%{NOTSPACE:GWSession}"]
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}GTS_ORA%{GREEDYDATA:infomsg}, clOrdId: %{NOTSPACE:clOrdId} securityId: %{NOTSPACE:securityId} lastQty: %{NOTSPACE:lastQty} lastPx: %{NOTSPACE:lastPx} trdRptId: %{NOTSPACE:trdRptId} trdLinkId: %{NOTSPACE:trdLinkId} trdTransType: %{NOTSPACE:trdTransType} msgEventSrc: %{NOTSPACE:msgEventSrc}"]
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPA--More--CE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}GTS_ORA%{GREEDYDATA:infomsg}, seqNum: %{NOTSPACE:seqNum} securityId: %{NOTSPACE:securityId}"]
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}%{GREEDYDATA:infomsg}%{SPACE}gtsSeqNum: %{NOTSPACE:gtsSeqNum} execId: %{NOTSPACE:execId} requestMsgId: %{NOTSPACE:requestMsgId} clOrdId: %{NOTSPACE:clOrdId}"]
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}%{GREEDYDATA:infomsg}%{SPACE}ClOrdId: %{NOTSPACE:clOrdId}%{SPACE}OrigClOrdId: %{NOTSPACE:OrigClOrdId}"]
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\)%{SPACE}%{GREEDYDATA:infomsg}%{SPACE}ClOrdId: %{NOTSPACE:clOrdId}%{SPACE}"]
      match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}\(%{NOTSPACE:session_id}\)%{SPACE}\(\) %{GREEDYDATA} -- SecurityId %{NOTSPACE:SecurityId}, %{GREEDYDATA} SecurityId %{NOTSPACE:SecurityId}"]
      add_tag => ["info"]
      remove_tag => ["_grokparsefailure"]
    }

    # ReferenceData Exception
    if "_grokparsefailure" in [tags] {
      grok {
        break_on_match => false
        match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}%{NOTSPACE:id_3}%{SPACE}%{NOTSPACE:filler}%{SPACE}%{NOTSPACE:ISEApps_type}%{SPACE}Exception while initializing ReferenceData.%{SPACE}%{NOTSPACE:Library}%{SPACE}Message request timed out:%{SPACE}sessId=%{NOTSPACE:session_id}"]
        add_tag    => ["ReferenceData_Exception"]
        remove_tag => ["_grokparsefailure"]
      }
    }
    # ReferenceData Service Failed
    if "_grokparsefailure" in [tags] {
      grok {
        break_on_match => false
        match   => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}%{
NOTSPACE:id_3}%{SPACE}%{NOTSPACE:filler}%{SPACE}%{NOTSPACE:ISEApps_type}%{SPACE}ReferenceData Service failed or unavailable."]
        add_tag    => ["ReferenceData_Service_Failed"]
remove_tag => ["_grokparsefailure"]
      }
    }
    # Process Socket Closed
    if "_grokparsefailure" in [tags] {
      grok {
        break_on_match => false
        match     => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}(%{NOTSPACE:session_id})%{SPACE}%{NOTSPACE:filler}%{SPACE}SocketEngine ProcessSocketClosed - WickSock exception caught%{SPACE}%{NOTSPACE:Exception}%{SPACE}%{NOTSPACE:excep_id}%{SPACE}An existing connection was forcibly closed by the remote host"]
        add_tag    => ["Process_Socket_Closed"]
        remove_tag => ["_grokparsefailure"]
      }
    }
    # Socket Write Error
    if "_grokparsefailure" in [tags] {
      grok {
        break_on_match => false
        match     => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%--More--{SPACE}(%{NOTSPACE:session_id})%{SPACE}%{NOTSPACE:filler}%{SPACE}SocketEngine WriteSock - error%{SPACE}%{NOTSPACE:Exception}%{SPACE}%{NOTSPACE:excep_id}%{SPACE}A request to send or receive data was disallowed because the socket is not connected"]
        add_tag    => ["Socket_Write_Error"]
        remove_tag => ["_grokparsefailure"]
      }
    }
    # Socket Logout Exception
    if "_grokparsefailure" in [tags] {
      grok {
        break_on_match => false
        match     => ["message", "%{NOTSPACE:errorlevel}%{SPACE}%{NOTSPACE:date}%{SPACE}%{HAPROXYTIME:log_time}%{SPACE}%{NOTSPACE:id_1}%{SPACE}%{NOTSPACE:id_2}%{SPACE}(%{NOTSPACE:session_id})%{SPACE}%{NOTSPACE:filler}%{SPACE}SocketEngine Send - Logout exception%{SPACE}%{NOTSPACE:Exception}%{SPACE}%{NOTSPACE:Exception}%{SPACE}%{NOTSPACE:excep_id}%{SPACE}A request to send or receive data was disallowed because the socket is not connected"]
        add_tag    => ["Socket_Logout_Exception"]
        remove_tag => ["_grokparsefailure"]
      }
    }
  }
}            

output {
  elasticsearch { host => elasticsearch }
}

I'm relatively new at this, so I am trying to understand how I can best troubleshoot what could be causing the processing to stop with no warnings or issues generated.

Many thanks again for any help!


(Kotter) #2

I also see that logstash is crushing a CPU core, wondering if this has to do with my logstash.conf file and if there is perhaps a more efficient way to process messages beyond the grok filters that look to be causing heavy CPU cycles:

top - 13:57:37 up 17 days, 13 min,  1 user,  load average: 0.98, 1.01, 0.96
Tasks: 225 total,   1 running, 224 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.4 us,  0.1 sy,  0.0 ni, 93.4 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  24748300 total, 11516964 used, 13231336 free,     4080 buffers
KiB Swap:        0 total,        0 used,        0 free.  9712464 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                             
20186 root      20   0 8755880 426956  21036 S 102.4  1.7   5:13.90 java                                                                                                
18509 root      20   0  9.779g 504752  26224 S   1.0  2.0   1:22.88 java                                                                                                
    1 root      20   0  175908   5328   3608 S   0.0  0.0   2:59.10 systemd                                                                                             
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.98 kthreadd                                                                                            
    3 root      20   0       0      0      0 S   0.0  0.0   2:34.89 ksoftirqd/0                                                                                         
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                                                                        
    8 root      20   0       0      0      0 S   0.0  0.0  25:31.81 rcu_sched                                                                                           
    9 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                                                                                              
   10 root      rt   0       0      0      0 S   0.0  0.0   0:03.55 migration/0                                                                                         
   11 root      rt   0       0      0      0 S   0.0  0.0   0:01.92 watchdog/0                                                                                          
   12 root      rt   0       0      0      0 S   0.0  0.0   0:01.22 watchdog/1                                                                                          
   13 root      rt   0       0      0      0 S   0.0  0.0   0:02.87 migration/1                                                                                         
   14 root      20   0       0      0      0 S   0.0  0.0   0:03.43 ksoftirqd/1                                                                                         
   16 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H                                                                                        
   18 root      rt   0       0      0      0 S   0.0  0.0   0:01.51 watchdog/2                                                                                          
   19 root      rt   0       0      0      0 S   0.0  0.0   0:04.21 migration/2                                                                                         
   20 root      20   0       0      0      0 S   0.0  0.0   0:14.95 ksoftirqd/2                                                                                         
   22 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/2:0H                                                                                        
   23 root      rt   0       0      0      0 S   0.0  0.0   0:01.31 watchdog/3                                                                                          
   24 root      rt   0       0      0      0 S   0.0  0.0   0:03.17 migration/3                                                                                         
   25 root      20   0       0      0      0 S   0.0  0.0   0:03.22 ksoftirqd/3                                                                                         
   27 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/3:0H                                                                                        
   28 root      rt   0       0      0      0 S   0.0  0.0   0:01.47 watchdog/4                                                                                          
   29 root      rt   0       0      0      0 S   0.0  0.0   0:03.60 migration/4                                                                                         
   30 root      20   0       0      0      0 S   0.0  0.0   0:11.60 ksoftirqd/4                                                                                         
   32 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/4:0H                                                                                        
   33 root      rt   0       0      0      0 S   0.0  0.0   0:01.28 watchdog/5                                                                                          
   34 root      rt   0       0      0      0 S   0.0  0.0   0:03.15 migration/5                                                                                         
   35 root      20   0       0      0      0 S   0.0  0.0   0:03.22 ksoftirqd/5                                                                                         
   37 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/5:0H                                                                                        
   38 root      rt   0       0      0      0 S   0.0  0.0   0:01.40 watchdog/6                                                                                          
   39 root      rt   0       0      0      0 S   0.0  0.0   0:03.25 migration/6                                                                                         
   40 root      20   0       0      0      0 S   0.0  0.0   0:09.14 ksoftirqd/6                                                                                         
   42 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/6:0H                                                                                        
   43 root      rt   0       0      0      0 S   0.0  0.0   0:01.28 watchdog/7                                                                                          
   44 root      rt   0       0      0      0 S   0.0  0.0   0:02.50 migration/7                                                                                         
   45 root      20   0       0      0      0 S   0.0  0.0   0:03.35 ksoftirqd/7                                                                                         
   47 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/7:0H                                                                                        
   48 root      rt   0       0      0      0 S   0.0  0.0   0:01.48 watchdog/8                                                                                          
   49 root      rt   0       0      0      0 S   0.0  0.0   0:03.22 migration/8                                                                                         
   50 root      20   0       0      0      0 S   0.0  0.0   0:01.26 ksoftirqd/8                                                                                         
   52 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/8:0H                                                                                        
   53 root      rt   0       0      0      0 S   0.0  0.0   0:01.31 watchdog/9                                                                                          
   54 root      rt   0       0      0      0 S   0.0  0.0   0:02.84 migration/9                                                                                         
   55 root      20   0       0      0      0 S   0.0  0.0   0:00.85 ksoftirqd/9                                                                                         
ic-dock02 conf.d # docker top 2cff64a3f730
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                20186               745                 99                  13:52               ?                   00:05:16            /usr/bin/java -Xmx500m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -jar /opt/logstash/vendor/jar/jruby-complete-1.7.11.jar -I/opt/logstash/lib /opt/logstash/lib/logstash/runner.rb agent -f /etc/logstash/conf.d/

(Chris M) #3

Hi. I had a very similar problem, and it looks like you're suffering from the same issues. You have a lot of repetition in your grok filters, the more times you search for the same thing, the longer those grok filters take. Eventually the logstash thread crashes, preceeded by pegging the CPU to kingdom-come. I wrote an overview of how I fixed that here: http://ghost.frodux.in/logstash-grok-speeds/ Lemme know if you have more questions that I can answer.


(system) #4