Grok on multiple continues lines


(Govind Raj) #1

Hi

We have a pattern as follows,

if the application works fine then not more 1 occurance of following log line will be present,

2017-02-04 17:25:21 20713 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:25:21 20714 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

<>

If application fails to load then , the logline with ntp prints continues on varying intervals as follows,

2017-02-04 17:25:21 20713 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:25:21 20714 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

2017-02-04 17:39:46 20715 20:39:44.213 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:39:46 20716 20:39:44.213 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

2017-02-04 17:43:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:43:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

If our application fails to load then the ntp calling log lines as above will appear in our device logs. Is there a way to capture the occurance of similar patterns ? we are not sure if we can use multiline here since we dont have specific start and end before and after this gets printed on the logs by the device.

Thanks
Raj


(Fabien Baligand) #2

If you aim to count how many lines with this pattern occurs in a defined time interval, and then throw warn/error alerts if you get more than N occurrences in the defined interval, then you should consider throttle filter.


(Govind Raj) #3

Thanks @fbaligand .

Let me try the same .


(Govind Raj) #4

Hi @fbaligand

I tried the folliwing but it creates no luck. what is that I am doing wrong here ?

    grok{
       match => { "message" => "%{DATA:logDate} %{NUMBER:logLineId} %{TIME:myTime} %{NOTSPACE:device}\(%{NUMBER:unUsed2}\) %{WORD:level}: %{GREEDYDATA:data}"}
    }
    date {
       match => [logDate,"yyyy-MM-dd HH:mm:ss"]
       target => "@timestamp"
    }


  if [device] == "timeman" and [level] == "Note" {
  grok {
  match => {"data" => "NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org"}
  add_tag => ["throttled"]
  }
 if "throttled" in [tags]{
  throttle {
   before_count => 7
   after_count => 10
   period => 600
   max_age => 1200
   key => {"data" => "NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org"}
   add_tag => "throttled22"
   }
 }

}

When I pass in the log as follows with stdin it all falls within the tag .
Sat Feb 04 17:20:01 UTC 2017 -
2017-02-04 17:20:01 20710 20:19:58.583 superman(173) Debug: TInputService.cpp:OnApplicationStateChanged application:id=2:pid=-1 stopped
2017-02-04 17:20:01 20711 20:19:58.584 sysman(157) Note: Applications have shut down - now unloading platform
2017-02-04 17:20:01 20712 20:19:58.584 sysman(157) Note: Stopping platform

Sat Feb 04 17:25:21 UTC 2017 -
2017-02-04 17:25:21 20713 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:25:21 20714 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:39:46 UTC 2017 -
2017-02-04 17:39:46 20715 20:39:44.213 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:39:46 20716 20:39:44.213 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:43:46 UTC 2017 -
2017-02-04 17:43:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:43:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:53:46 UTC 2017 -
2017-02-04 17:53:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:53:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 18:18:46 UTC 2017 -
2017-02-04 18:18:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 18:88:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 18:23:46 UTC 2017 -
2017-02-04 18:23:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 18:23:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 18:31:46 UTC 2017 -
2017-02-04 18:31:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 18:31:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Ideally what I looking for is if the log line with ntpdatewrapper occurs for more than 8 times / 10 times it should be identified by logstash shouldbe ingested to ELK from where I will create an alert for the incident.

Can you tell me what am I doing wrong here ?


(Fabien Baligand) #5

First, if you want to just add a tag, grok is not the right plugin, you should use "mutate" plugin.
grok is done to extract data from a field to fill other fields.

Then, you don't use "key" as it should be used. It should be a simple string referencing one or several field(s).
The aim is to to have one counter per key.

So up to me, the right configuration could be :

if [device] == "timeman" and [level] == "Note" and [data] == "NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org" {
  throttle {
   before_count => 7
   after_count => 10
   period => 600
   max_age => 1200
   key => "%{data}"
   add_tag => ["throttled22"]
   }
 }

(Fabien Baligand) #6

If you want to add a tag only when you get more than 10 occurrences in 600s, you should not define "before_count" option.


(Govind Raj) #7

Thanks @fbaligand

It helped a lot. And yes we dont need to use before count option in this use case.

Thanks
Raj


(Govind Raj) #8

@fbaligand

I have one more question . By using agregate / multilane is there a way I check the presence of the log line after a specific incident.

For example if my ntplog line happens only after platform stopping message
2017-02-04 17:20:01 20711 20:19:58.584 sysman(157) Note: Applications have shut down - now unloading platform
2017-02-04 17:20:01 20712 20:19:58.584 sysman(157) Note: Stopping platform

Sat Feb 04 17:25:21 UTC 2017 -
2017-02-04 17:25:21 20713 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:25:21 20714 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Is there way we can use logstash to capture the event followed by another event. I tried the multiline but the confusing part is that we might get the timestamp simply printed before the real log line ( same as above) any thoughts on this. Sorry for taking this bit late, we encountered this issue when we tested in real scenario. Hence to ensure the real incident I need to capture the log line of ntp followed only by platform shutdown message.

Thanks in adavance.
Raj


(Fabien Baligand) #9

I think aggregate plugin could answer your need.
The following code allows to detect if "Stopping platform" message happened maximum 60 seconds before.

if [data] == "Stopping platform" {
  aggregate {
    task_id => "global"
    code => "map['stopping_platform'] = true"
    timeout => 60  
  }
}

if [device] == "timeman" and [level] == "Note" and [data] == "NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org" {
  aggregate {
    task_id => "global"
    code => "event.tag('stopping_platform') if map['stopping_platform']"
    map_action => "update"
  }

  if "stopping_platform" in [tags] {
    throttle {
      after_count => 10
      period => 600
      max_age => 1200
      key => "%{data}"
      add_tag => ["throttled22"]
    }
  }
}

(Govind Raj) #10

We tried the same. It seems like the logs are all resulting in grokparsefailure. I tried to add tags to each aggregate to debug the issue with minimum count and period but it results in parsefailure, eventually its not getting tagged as "throttled22" .

if [level] == "Note" and [data] == "Stopping platform" {
aggregate {
task_id => "global"
code => "map['stopping_platform'] = true"
timeout => 60
add_tag => ["firstaggregate"]
remove_tag => ["_grokparsefailure"]
}
}

if [device] == "timeman" and [level] == "Note" and [data] == "NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org" {
aggregate {
task_id => "global"
code => "event.tag('stopping_platform') if map['stopping_platform']"
map_action => "update"
add_tag => ["seconfaggregate"]
}

if "stopping_platform" in [tags] {
throttle {
after_count => 1
period => 60

max_age => 1200

  key => "%{data}"
  add_tag => ["throttled22"]
}

}
}


(Fabien Baligand) #11

If you have grok parse failure , it is no tied to aggregate plugin but to a grok plugin.


(Govind Raj) #12

I tried as below without any grok . in stdout I can see no failures at all. But its not getting tagged as throttled22 .

My conf.

input {
stdin {}
}

filter {
if [type] != "httpstblogs"{
grok{
match => { "message" => "%{DATA:logDate} %{NUMBER:logLineId} %{TIME:myTime} %{NOTSPACE:device}(%{NUMBER:unUsed2}) %{WORD:level}: %{GREEDYDATA:data}"}
}
date {
match => [logDate,"yyyy-MM-dd HH:mm:ss"]
target => "@timestamp"
}
if [data] == "Stopping platform" {
aggregate {
task_id => "global"
code => "map['stopping_platform'] = true"
timeout => 60
add_tag => ["firstaggregate"]
remove_tag => ["_grokparsefailure"]
}
}

if [device] == "timeman" and [level] == "Note" and [data] == "NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org" {
aggregate {
task_id => "global"
code => "event.tag('stopping_platform') if map['stopping_platform']"
map_action => "update"
add_tag => ["seconfaggregate"]
}

if "stopping_platform" in [tags] {
throttle {
after_count => 3
period => 60

max_age => 1200

    key => "%{data}"
    add_tag => ["throttled22"]
  }
}

}
if [type] =="httpstblogs"
{
drop{}
}
}
}
output
{
stdout{ codec => rubydebug }
}

My sample input

2017-02-04 17:20:01 20711 20:19:58.584 sysman(157) Note: Applications have shut down - now unloading platform
2017-02-04 17:20:01 20712 20:19:58.584 sysman(157) Note: Stopping platform

Sat Feb 04 17:25:21 UTC 2017 -
2017-02-04 17:25:21 20713 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:25:21 20714 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:39:46 UTC 2017 -
2017-02-04 17:39:46 20715 20:39:44.213 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:39:46 20716 20:39:44.213 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:43:46 UTC 2017 -
2017-02-04 17:43:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:43:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:53:46 UTC 2017 -
2017-02-04 17:53:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-04 17:53:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

not sure why !!


(Fabien Baligand) #13

OK, so after some tests, I can confirm that your grok pattern is wrong.

Here's the right pattern :
^%{TIMESTAMP_ISO8601:logDate} %{NUMBER:logLineId} %{TIME:myTime} %{WORD:device}\(%{NUMBER}\) %{WORD:level}: %{GREEDYDATA:data}$

Some advices :

  • when your grok pattern does not work, remove _grokparsefailure tag is never the solution
  • you can test/debug easily your grok patterns here : https://grokdebug.herokuapp.com/
  • you can find all default grok known patterns here : https://grokdebug.herokuapp.com/patterns
    then click on "grok-patterns" link
  • if you don't use a part of the pattern, you can just not assign the pattern to a field : %{NUMBER}
  • don't forget a grok pattern is a regex pattern. So you have to escape special regex chars, like ()
  • for performance reasons, if you have the full pattern (from beggining to end), start with ^ and finish with $

(Govind Raj) #14

Hi @fbaligand

I treid the same as suggested with the pattern you shared still I dont see the logline getting tagged as "throttled2" . in your testing did you get the stdout tagged as expected. Would be great if you share the same with me. Here is my conf and sample input.

Config
input {
stdin {}
}

filter {
if [type] != "httpstblogs"{
grok{
match => { "message" => "^%{TIMESTAMP_ISO8601:logDate} %{NUMBER:logLineId} %{TIME:myTime} %{WORD:device}(%{NUMBER}) %{WORD:level}: %{GREEDYDATA:data}$" }
}
date {
match => [logDate,"yyyy-MM-dd HH:mm:ss"]
target => "@timestamp"
}
if [data] == "Stopping platform" {
aggregate {
task_id => "global"
code => "map['stopping_platform'] = true"
timeout => 600
}
}

if [device] == "timeman" and [level] == "Note" and [data] == "NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org" {
aggregate {
task_id => "global"
code => "event.tag('stopping_platform') if map['stopping_platform']"
map_action => "update"
}

if "stopping_platform" in [tags] {
throttle {
after_count => 2
period => 1200
max_age => 1200
key => "%{data}"
add_tag => ["throttled22"]
}
}
}
if [type] =="httpstblogs"
{
drop{}
}
}
}
output
{
stdout{ codec => rubydebug }
}

Sample input

Sat Feb 04 17:20:01 UTC 2017 -
2017-02-12 07:20:01 20710 20:19:58.583 superman(173) Debug: TInputService.cpp:OnApplicationStateChanged application:id=2:pid=-1 stopped
2017-02-12 07:20:01 20711 20:19:58.584 sysman(157) Note: Applications have shut down - now unloading platform
2017-02-12 07:20:01 20712 20:19:58.584 sysman(157) Note: Stopping platform

Sat Feb 04 17:25:21 UTC 2017 -
2017-02-12 07:25:21 20713 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-12 07:25:21 20714 20:25:21.251 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:39:46 UTC 2017 -
2017-02-12 07:39:46 20715 20:39:44.213 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-12 07:39:46 20716 20:39:44.213 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:43:46 UTC 2017 -
2017-02-12 07:43:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-12 07:43:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

Sat Feb 04 17:44:46 UTC 2017 -
2017-02-12 07:44:46 20717 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:ntpdate try ntp server: asia.pool.ntp.org
2017-02-12 07:44:46 20718 20:43:44.224 timeman(170) Note: NtpdateWrapper:: StartNtpdateProcess:Calling /usr/sbin/ntpdate -q -p 1 asia.pool.ntp.org

would be great if you could tell me what am I doing wrong here .

Thanks in advance
Govind


(Fabien Baligand) #15

OK, this time, I tested the configuration. So :

first thing, in your grok pattern, you must escape ( and ) :

grok{
match => { "message" => "^%{TIMESTAMP_ISO8601:logDate} %{NUMBER:logLineId} %{TIME:myTime} %{WORD:device}\(%{NUMBER}\) %{WORD:level}: %{GREEDYDATA:data}$" }
} 

second thing : the aggregate task_id must be a dynamic expression. My mistake : I told you to set "global".
So here's a potential right task_id (that works fine on my test) :
task_id => "%{host}"


(Govind Raj) #16

Hi @fbaligand

Thanks a ton !! Seems like it saved my week !! Now we got that working as expected . I did change the task_id into a different field (like filename of the source) since we have input from 20-100 files . And now it works fine. 100 (y).

Thanks
Govind


(Fabien Baligand) #17

Great news !
Happy to see I helped you to solve your problem !

To close the topic, could you mark my previous comment as "solution" ?


(Govind Raj) #18

Done.

Thanks again .
Govind


(Fabien Baligand) #19

You're welcome :slight_smile:


(system) #20

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.