Can someone help with the grok, I can't figure out how to make it work?


(Kenneth Fribert) #1

I have this type of logfile:

Events of MHSendMessagesService

                               * STOP *

                          0320 *** START ***
                                 * Module: C:\Program Files (x86)\PST\Million Handshakes\Dialog Server\MHSendMessagesService.exe
                                 * Version: 6.1.4.931
                                 * Computer: DELTA40001
                                 * UserName: PDsvc-DELTA40001
                                 * Address IP: 10.100.1.130
   2019-01-07  8:36:19.843 1390 [DELTA -] Starting to send messages on output channel 1001, queue 0
  2019-01-07  8:36:19.844 1390 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  8:36:20.763 1390 [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  8:36:20.763 1390 [DELTA -] Sent 1 messages in transaction 1 on output channel 1001, queue 0
  2019-01-07  8:36:20.764 1390 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  8:36:20.787 1390 [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  8:36:20.788 1390 [DELTA -] Completed sending messages on output channel 1001, queue 0
  2019-01-07  9:21:18.409 13C4 [DELTA -] Starting to send messages on output channel 1001, queue 0
  2019-01-07  9:21:18.410 13C4 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:21:19.498 13C4 [DELTA -] End DialogServer.SendMessages last 1 seconds
  2019-01-07  9:21:19.499 13C4 [DELTA -] Sent 3 messages in transaction 1 on output channel 1001, queue 0
  2019-01-07  9:21:19.500 13C4 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:21:19.517 13C4 [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:21:19.518 13C4 [DELTA -] Completed sending messages on output channel 1001, queue 0
  2019-01-07  9:35:24.911 13FC [DELTA -] Starting to send messages on output channel 1022, queue 0
  2019-01-07  9:35:24.911 1484 [DELTA -] Starting to send messages on output channel 1001, queue 0
  2019-01-07  9:35:24.911 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:24.912 1484 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.032 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.033 13FC [DELTA -] Sent 1 messages in transaction 1 on output channel 1022, queue 0
  2019-01-07  9:35:25.034 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.144 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.145 13FC [DELTA -] Sent 1 messages in transaction 2 on output channel 1022, queue 0
  2019-01-07  9:35:25.146 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.225 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.225 13FC [DELTA -] Sent 1 messages in transaction 3 on output channel 1022, queue 0
  2019-01-07  9:35:25.226 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.300 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.301 13FC [DELTA -] Sent 1 messages in transaction 4 on output channel 1022, queue 0
  2019-01-07  9:35:25.302 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.369 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.370 13FC [DELTA -] Sent 1 messages in transaction 5 on output channel 1022, queue 0
  2019-01-07  9:35:25.370 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.388 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.388 13FC [DELTA -] Completed sending messages on output channel 1022, queue 0
  2019-01-07  9:35:26.169 1484 [DELTA -] End DialogServer.SendMessages last 1 seconds
  2019-01-07  9:35:26.169 1484 [DELTA -] Sent 5 messages in transaction 1 on output channel 1001, queue 0
  2019-01-07  9:35:26.170 1484 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:26.198 1484 [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:26.199 1484 [DELTA -] Completed sending messages on output channel 1001, queue 0

So I have 4 types of lines that I want to react on

Starting
Sent
End
Completed

Sent and End can appear multiple times withing Starting and Completed, and the lines can be jumbled between each other, so at a later stage I have to use aggregate.
The rest I just want to ignore.

So I wanted to create a GROK filter.

First line to handle (the lines all start with two spaces):

2019-01-07 7:06:57.948 0320 [DELTA -] Starting to send messages on output channel 1001, queue 0

This could match:
"^ %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} to send %{GREEDYDATA}$"

Second line to handle:

2019-01-07 7:21:58.554 11EC [DELTA -] End DialogServer.SendMessages last 3 seconds

Match:
"^ %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} %{NOTSPACE} %{NOTSPACE} {INT:number2} seconds$"}"

Third line:

2019-01-07 7:37:45.074 13F8 [DELTA -] Sent 15 messages in transaction 1 on output channel 1001, queue 0

Match:
"^ %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} %{INT:number1} messages %{GREEDYDATA}$"

Fourth line:
"^ %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} sending %{GREEDYDATA}$"

Match, the same as the first match.

I'm a bit unsure of how to do this, are the matches ordered?
So could I do it like this?

 grok {
   match => {
    "message" => [
      "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} to send %{GREEDYDATA}$",
      "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} %{INT:number1} messages %{GREEDYDATA}$",
      "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} %{NOTSPACE} %{NOTSPACE} {INT:number2} seconds$",
      "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} sending %{GREEDYDATA}$"
      ]
    }
 }

I then created a special filebeat config to test this, and as I read that, I see 2 '_grokparsefailue' in the tags for each entry.
So as I understand it, it will enter a failed tag for each non-matching pattern???
Also, all the fieldnames I gave in the grok filter, are not showing up in the debug output?

What on earth is it I'm doing wrong?


#2

No, provided at least one of the patterns matches a line you should not get a failed tag.

When I take one of your lines and run it against your patterns it matches just fine...

      "time" => "7:06:57.948",
    "taskid" => "0320",
    "status" => "Starting",
   "message" => "  2019-01-07 7:06:57.948 0320 [DELTA -] Starting to send messages on output channel 1001, queue 0",
      "date" => "2019-01-07",
  "instance" => "DELTA"

I suggest output { stdout { codec => rubydebug } } and examine carefully the incoming message field to make sure it is what you expect.

Also, you cannot have a space embedded in a field name (note that the field is called instance, not instance name).

BTW, it would be more efficient to split this into two groks.

grok {
   match => {
       "message" => [ "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{GREEDYDATA:restOfLine}"]
    }
}
grok {
   match => {
       "restOfLine" => [
"^%{WORD:status} to send %{GREEDYDATA}",
"^%{WORD:status} %{INT:number1} messages %{GREEDYDATA}",
"^%{WORD:status} %{NOTSPACE} %{NOTSPACE} {INT:number2} seconds$",
"^%{WORD:status} sending %{GREEDYDATA}"
        ]
    }
}

(Kenneth Fribert) #3

Hi Badger

Wow, thankyou for the optimization. That looks very cool.
Very clever splitting the matching in two, I'm going to use that in another filter I'm working on.
As I read you suggestion $ doesn't have any effect after GREEDYDATA which makes perfect sense.

I changed the time, as it seems that it won't match if the hours is single digits?

Current filter:

filter {
  if "portraitsending" in [fields][logtype] {
     grok {
       match => {
          "message" => [ "^ %{NOTSPACE:date}%{SPACE}%{NOTSPACE:clock} %{WORD:taskid} \[%{WORD:instance} -] %{GREEDYDATA:restOfLine}"]
       }
     }
     grok {
       match => {
        "restOfLine" => [
          "%{WORD:status} to send %{GREEDYDATA}",
          "%{WORD:status} %{INT:number1} messages %{GREEDYDATA}",
          "%{WORD:status} %{NOTSPACE} %{NOTSPACE} {INT:number2} seconds$",
          "%{WORD:status} sending %{GREEDYDATA}"
          ]
        }
     }

     mutate {
       add_field => {
         "datetime" => "%{date} %{clock}"
       }
       remove_field => ["date", "time"]
       rename => {"instance" => "Instance name"}
     }
     date {
       match => [ "datetime", "yyyy-MM-dd H:mm:ss.SSS",  "yyyy-MM-dd HH:mm:ss.SSS"]
     }
  }
}

#############################################


output {
  stdout { codec => rubydebug }

#          elasticsearch {
#              hosts => "elasticsearch:9200"
#                      index => "%{[[fields][logtype]}-%{[@metadata][version]}-%{+YYYY.MM}"
#                          document_type => "%{[@metadata][type]}"
#            }

}

I've tried sending a single line to the logstash (logstash is 6.5.1)

2019-01-07 8:31:05.368 0810 [BERLIN_PSL -] Starting to send messages on output channel 1001, queue 7

, and that shows me:

logstashdebug     | [2019-01-14T12:33:20,895][DEBUG][logstash.pipeline        ] filter received {"event"=>{"fields"=>{"logtype"=>"portraitsending"}, "beat"=>{"name"=>"BERLIN40001", "version"=>"6.5.1", "hostname"=>"BERLIN40001"}, "input"=>{"type"=>"log"}, "source"=>"C:\\test\\Starting.txt", "prospector"=>{"type"=>"log"}, "@timestamp"=>2019-01-14T12:33:19.500Z, "message"=>"  2019-01-07  8:31:05.368 0810 [BERLIN_PSL -] Starting to send messages on output channel 1001, queue 7", "host"=>{"name"=>"BERLIN40001"}, "@version"=>"1", "offset"=>0, "tags"=>["SENDING", "beats_input_codec_plain_applied"]}}
logstashdebug     | [2019-01-14T12:33:20,934][DEBUG][logstash.filters.grok    ] Running grok filter {:event=>#<LogStash::Event:0x5a3dc8d4>}
logstashdebug     | [2019-01-14T12:33:20,974][DEBUG][logstash.filters.grok    ] Event now:  {:event=>#<LogStash::Event:0x5a3dc8d4>}
logstashdebug     | [2019-01-14T12:33:20,987][DEBUG][logstash.filters.grok    ] Running grok filter {:event=>#<LogStash::Event:0x5a3dc8d4>}
logstashdebug     | [2019-01-14T12:33:20,989][DEBUG][logstash.filters.grok    ] Event now:  {:event=>#<LogStash::Event:0x5a3dc8d4>}
logstashdebug     | [2019-01-14T12:33:21,046][DEBUG][logstash.util.decorators ] filters/LogStash::Filters::Mutate: adding value to field {"field"=>"datetime", "value"=>["%{date} %{clock}"]}
logstashdebug     | [2019-01-14T12:33:21,054][DEBUG][logstash.filters.mutate  ] filters/LogStash::Filters::Mutate: removing field {:field=>"date"}
logstashdebug     | [2019-01-14T12:33:21,055][DEBUG][logstash.filters.mutate  ] filters/LogStash::Filters::Mutate: removing field {:field=>"time"}
logstashdebug     | [2019-01-14T12:33:21,087][DEBUG][logstash.pipeline        ] output received {"event"=>{"fields"=>{"logtype"=>"portraitsending"}, "beat"=>{"name"=>"BERLIN40001", "version"=>"6.5.1", "hostname"=>"BERLIN40001"}, "input"=>{"type"=>"log"}, "source"=>"C:\\test\\Starting.txt", "prospector"=>{"type"=>"log"}, "@timestamp"=>2019-01-14T12:33:19.500Z, "message"=>"  2019-01-07  8:31:05.368 0810 [BERLIN_PSL -] Starting to send messages on output channel 1001, queue 7", "host"=>{"name"=>"BERLIN40001"}, "@version"=>"1", "offset"=>0, "datetime"=>"%{date} %{clock}", "tags"=>["SENDING", "beats_input_codec_plain_applied", "_grokparsefailure", "_dateparsefailure"]}}
logstashdebug     | [2019-01-14T12:33:22,632][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x6aef96ba sleep>"}
logstashdebug     | {
logstashdebug     |         "fields" => {
logstashdebug     |         "logtype" => "portraitsending"
logstashdebug     |     },
logstashdebug     |           "beat" => {
logstashdebug     |             "name" => "BERLIN40001",
logstashdebug     |          "version" => "6.5.1",
logstashdebug     |         "hostname" => "BERLIN40001"
logstashdebug     |     },
logstashdebug     |          "input" => {
logstashdebug     |         "type" => "log"
logstashdebug     |     },
logstashdebug     |         "source" => "C:\\test\\Starting.txt",
logstashdebug     |     "prospector" => {
logstashdebug     |         "type" => "log"
logstashdebug     |     },
logstashdebug     |     "@timestamp" => 2019-01-14T12:33:19.500Z,
logstashdebug     |        "message" => "  2019-01-07  8:31:05.368 0810 [BERLIN_PSL -] Starting to send messages on output channel 1001, queue 7",
logstashdebug     |           "host" => {
logstashdebug     |         "name" => "BERLIN40001"
logstashdebug     |     },
logstashdebug     |       "@version" => "1",
logstashdebug     |         "offset" => 0,
logstashdebug     |       "datetime" => "%{date} %{clock}",
logstashdebug     |           "tags" => [
logstashdebug     |         [0] "SENDING",
logstashdebug     |         [1] "beats_input_codec_plain_applied",
logstashdebug     |         [2] "_grokparsefailure",
logstashdebug     |         [3] "_dateparsefailure"
logstashdebug     |     ]
logstashdebug     | }

So as I read it, the grok doesn't place any values in the fields?


#4

You need 2 spaces at the start, not one. Or perhaps "^%{SPACE}%{NOTSPACE:date}%{SPACE}


(Kenneth Fribert) #5

Doooh, I've been staring myself completely blind on this :laughing:

It works like a charm now WOW!

Now, I'll test the other lines, and then I guess I add a drop for the _grokparsefailure for all the uninteresting lines.


(Kenneth Fribert) #6

Ok, so I got it to intepret the 4 lines as it should.

filter {
  if "portraitsending" in [fields][logtype] {
     grok {
       match => {
          "message" => [ "^  %{NOTSPACE:date}%{SPACE}%{NOTSPACE:clock} %{WORD:taskid} \[%{WORD:instance} -] %{WORD:status} %{GREEDYDATA:restOfLine}"]
       }
     }
     grok {
       match => {
        "restOfLine" => [
          "to send %{GREEDYDATA}",
          "%{INT:batchsent} messages %{GREEDYDATA}",
          "DialogServer.SendMessages last %{INT:batchduration} seconds$",
          "sending %{GREEDYDATA}"
          ]
        }
     }

     mutate {
       add_field => {
         "datetime" => "%{date} %{clock}"
       }
       remove_field => [ "date", "clock", "restOfLine" ]
       rename => { "instance" => "Instance name" }
     }
     date {
       match => [ "datetime", "yyyy-MM-dd H:mm:ss.SSS",  "yyyy-MM-dd HH:mm:ss.SSS"]
     }

  }
}

Now that I have the four lines, I have to look into aggregate plugin, that looks complicated. Not sure how to debug it.