Can someone help with the grok, I can't figure out how to make it work?

I have this type of logfile:

Events of MHSendMessagesService

                               * STOP *

                          0320 *** START ***
                                 * Module: C:\Program Files (x86)\PST\Million Handshakes\Dialog Server\MHSendMessagesService.exe
                                 * Version: 6.1.4.931
                                 * Computer: DELTA40001
                                 * UserName: PDsvc-DELTA40001
                                 * Address IP: 10.100.1.130
   2019-01-07  8:36:19.843 1390 [DELTA -] Starting to send messages on output channel 1001, queue 0
  2019-01-07  8:36:19.844 1390 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  8:36:20.763 1390 [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  8:36:20.763 1390 [DELTA -] Sent 1 messages in transaction 1 on output channel 1001, queue 0
  2019-01-07  8:36:20.764 1390 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  8:36:20.787 1390 [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  8:36:20.788 1390 [DELTA -] Completed sending messages on output channel 1001, queue 0
  2019-01-07  9:21:18.409 13C4 [DELTA -] Starting to send messages on output channel 1001, queue 0
  2019-01-07  9:21:18.410 13C4 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:21:19.498 13C4 [DELTA -] End DialogServer.SendMessages last 1 seconds
  2019-01-07  9:21:19.499 13C4 [DELTA -] Sent 3 messages in transaction 1 on output channel 1001, queue 0
  2019-01-07  9:21:19.500 13C4 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:21:19.517 13C4 [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:21:19.518 13C4 [DELTA -] Completed sending messages on output channel 1001, queue 0
  2019-01-07  9:35:24.911 13FC [DELTA -] Starting to send messages on output channel 1022, queue 0
  2019-01-07  9:35:24.911 1484 [DELTA -] Starting to send messages on output channel 1001, queue 0
  2019-01-07  9:35:24.911 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:24.912 1484 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.032 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.033 13FC [DELTA -] Sent 1 messages in transaction 1 on output channel 1022, queue 0
  2019-01-07  9:35:25.034 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.144 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.145 13FC [DELTA -] Sent 1 messages in transaction 2 on output channel 1022, queue 0
  2019-01-07  9:35:25.146 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.225 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.225 13FC [DELTA -] Sent 1 messages in transaction 3 on output channel 1022, queue 0
  2019-01-07  9:35:25.226 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.300 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.301 13FC [DELTA -] Sent 1 messages in transaction 4 on output channel 1022, queue 0
  2019-01-07  9:35:25.302 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.369 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.370 13FC [DELTA -] Sent 1 messages in transaction 5 on output channel 1022, queue 0
  2019-01-07  9:35:25.370 13FC [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:25.388 13FC [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:25.388 13FC [DELTA -] Completed sending messages on output channel 1022, queue 0
  2019-01-07  9:35:26.169 1484 [DELTA -] End DialogServer.SendMessages last 1 seconds
  2019-01-07  9:35:26.169 1484 [DELTA -] Sent 5 messages in transaction 1 on output channel 1001, queue 0
  2019-01-07  9:35:26.170 1484 [DELTA -] Call DialogServer.SendMessages
  2019-01-07  9:35:26.198 1484 [DELTA -] End DialogServer.SendMessages last 0 seconds
  2019-01-07  9:35:26.199 1484 [DELTA -] Completed sending messages on output channel 1001, queue 0

So I have 4 types of lines that I want to react on

Starting
Sent
End
Completed

Sent and End can appear multiple times withing Starting and Completed, and the lines can be jumbled between each other, so at a later stage I have to use aggregate.
The rest I just want to ignore.

So I wanted to create a GROK filter.

First line to handle (the lines all start with two spaces):

2019-01-07 7:06:57.948 0320 [DELTA -] Starting to send messages on output channel 1001, queue 0

This could match:
"^ %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} to send %{GREEDYDATA}$"

Second line to handle:

2019-01-07 7:21:58.554 11EC [DELTA -] End DialogServer.SendMessages last 3 seconds

Match:
"^ %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} %{NOTSPACE} %{NOTSPACE} {INT:number2} seconds$"}"

Third line:

2019-01-07 7:37:45.074 13F8 [DELTA -] Sent 15 messages in transaction 1 on output channel 1001, queue 0

Match:
"^ %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} %{INT:number1} messages %{GREEDYDATA}$"

Fourth line:
"^ %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} sending %{GREEDYDATA}$"

Match, the same as the first match.

I'm a bit unsure of how to do this, are the matches ordered?
So could I do it like this?

 grok {
   match => {
    "message" => [
      "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} to send %{GREEDYDATA}$",
      "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} %{INT:number1} messages %{GREEDYDATA}$",
      "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} %{NOTSPACE} %{NOTSPACE} {INT:number2} seconds$",
      "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{WORD:status} sending %{GREEDYDATA}$"
      ]
    }
 }

I then created a special filebeat config to test this, and as I read that, I see 2 '_grokparsefailue' in the tags for each entry.
So as I understand it, it will enter a failed tag for each non-matching pattern???
Also, all the fieldnames I gave in the grok filter, are not showing up in the debug output?

What on earth is it I'm doing wrong?

No, provided at least one of the patterns matches a line you should not get a failed tag.

When I take one of your lines and run it against your patterns it matches just fine...

      "time" => "7:06:57.948",
    "taskid" => "0320",
    "status" => "Starting",
   "message" => "  2019-01-07 7:06:57.948 0320 [DELTA -] Starting to send messages on output channel 1001, queue 0",
      "date" => "2019-01-07",
  "instance" => "DELTA"

I suggest output { stdout { codec => rubydebug } } and examine carefully the incoming message field to make sure it is what you expect.

Also, you cannot have a space embedded in a field name (note that the field is called instance, not instance name).

BTW, it would be more efficient to split this into two groks.

grok {
   match => {
       "message" => [ "^  %{NOTSPACE:date}%{SPACE}%{TIME:time} %{WORD:taskid} \[%{WORD:instance name} -] %{GREEDYDATA:restOfLine}"]
    }
}
grok {
   match => {
       "restOfLine" => [
"^%{WORD:status} to send %{GREEDYDATA}",
"^%{WORD:status} %{INT:number1} messages %{GREEDYDATA}",
"^%{WORD:status} %{NOTSPACE} %{NOTSPACE} {INT:number2} seconds$",
"^%{WORD:status} sending %{GREEDYDATA}"
        ]
    }
}
1 Like

Hi Badger

Wow, thankyou for the optimization. That looks very cool.
Very clever splitting the matching in two, I'm going to use that in another filter I'm working on.
As I read you suggestion $ doesn't have any effect after GREEDYDATA which makes perfect sense.

I changed the time, as it seems that it won't match if the hours is single digits?

Current filter:

filter {
  if "portraitsending" in [fields][logtype] {
     grok {
       match => {
          "message" => [ "^ %{NOTSPACE:date}%{SPACE}%{NOTSPACE:clock} %{WORD:taskid} \[%{WORD:instance} -] %{GREEDYDATA:restOfLine}"]
       }
     }
     grok {
       match => {
        "restOfLine" => [
          "%{WORD:status} to send %{GREEDYDATA}",
          "%{WORD:status} %{INT:number1} messages %{GREEDYDATA}",
          "%{WORD:status} %{NOTSPACE} %{NOTSPACE} {INT:number2} seconds$",
          "%{WORD:status} sending %{GREEDYDATA}"
          ]
        }
     }

     mutate {
       add_field => {
         "datetime" => "%{date} %{clock}"
       }
       remove_field => ["date", "time"]
       rename => {"instance" => "Instance name"}
     }
     date {
       match => [ "datetime", "yyyy-MM-dd H:mm:ss.SSS",  "yyyy-MM-dd HH:mm:ss.SSS"]
     }
  }
}

#############################################


output {
  stdout { codec => rubydebug }

#          elasticsearch {
#              hosts => "elasticsearch:9200"
#                      index => "%{[[fields][logtype]}-%{[@metadata][version]}-%{+YYYY.MM}"
#                          document_type => "%{[@metadata][type]}"
#            }

}

I've tried sending a single line to the logstash (logstash is 6.5.1)

2019-01-07 8:31:05.368 0810 [BERLIN_PSL -] Starting to send messages on output channel 1001, queue 7

, and that shows me:

logstashdebug     | [2019-01-14T12:33:20,895][DEBUG][logstash.pipeline        ] filter received {"event"=>{"fields"=>{"logtype"=>"portraitsending"}, "beat"=>{"name"=>"BERLIN40001", "version"=>"6.5.1", "hostname"=>"BERLIN40001"}, "input"=>{"type"=>"log"}, "source"=>"C:\\test\\Starting.txt", "prospector"=>{"type"=>"log"}, "@timestamp"=>2019-01-14T12:33:19.500Z, "message"=>"  2019-01-07  8:31:05.368 0810 [BERLIN_PSL -] Starting to send messages on output channel 1001, queue 7", "host"=>{"name"=>"BERLIN40001"}, "@version"=>"1", "offset"=>0, "tags"=>["SENDING", "beats_input_codec_plain_applied"]}}
logstashdebug     | [2019-01-14T12:33:20,934][DEBUG][logstash.filters.grok    ] Running grok filter {:event=>#<LogStash::Event:0x5a3dc8d4>}
logstashdebug     | [2019-01-14T12:33:20,974][DEBUG][logstash.filters.grok    ] Event now:  {:event=>#<LogStash::Event:0x5a3dc8d4>}
logstashdebug     | [2019-01-14T12:33:20,987][DEBUG][logstash.filters.grok    ] Running grok filter {:event=>#<LogStash::Event:0x5a3dc8d4>}
logstashdebug     | [2019-01-14T12:33:20,989][DEBUG][logstash.filters.grok    ] Event now:  {:event=>#<LogStash::Event:0x5a3dc8d4>}
logstashdebug     | [2019-01-14T12:33:21,046][DEBUG][logstash.util.decorators ] filters/LogStash::Filters::Mutate: adding value to field {"field"=>"datetime", "value"=>["%{date} %{clock}"]}
logstashdebug     | [2019-01-14T12:33:21,054][DEBUG][logstash.filters.mutate  ] filters/LogStash::Filters::Mutate: removing field {:field=>"date"}
logstashdebug     | [2019-01-14T12:33:21,055][DEBUG][logstash.filters.mutate  ] filters/LogStash::Filters::Mutate: removing field {:field=>"time"}
logstashdebug     | [2019-01-14T12:33:21,087][DEBUG][logstash.pipeline        ] output received {"event"=>{"fields"=>{"logtype"=>"portraitsending"}, "beat"=>{"name"=>"BERLIN40001", "version"=>"6.5.1", "hostname"=>"BERLIN40001"}, "input"=>{"type"=>"log"}, "source"=>"C:\\test\\Starting.txt", "prospector"=>{"type"=>"log"}, "@timestamp"=>2019-01-14T12:33:19.500Z, "message"=>"  2019-01-07  8:31:05.368 0810 [BERLIN_PSL -] Starting to send messages on output channel 1001, queue 7", "host"=>{"name"=>"BERLIN40001"}, "@version"=>"1", "offset"=>0, "datetime"=>"%{date} %{clock}", "tags"=>["SENDING", "beats_input_codec_plain_applied", "_grokparsefailure", "_dateparsefailure"]}}
logstashdebug     | [2019-01-14T12:33:22,632][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x6aef96ba sleep>"}
logstashdebug     | {
logstashdebug     |         "fields" => {
logstashdebug     |         "logtype" => "portraitsending"
logstashdebug     |     },
logstashdebug     |           "beat" => {
logstashdebug     |             "name" => "BERLIN40001",
logstashdebug     |          "version" => "6.5.1",
logstashdebug     |         "hostname" => "BERLIN40001"
logstashdebug     |     },
logstashdebug     |          "input" => {
logstashdebug     |         "type" => "log"
logstashdebug     |     },
logstashdebug     |         "source" => "C:\\test\\Starting.txt",
logstashdebug     |     "prospector" => {
logstashdebug     |         "type" => "log"
logstashdebug     |     },
logstashdebug     |     "@timestamp" => 2019-01-14T12:33:19.500Z,
logstashdebug     |        "message" => "  2019-01-07  8:31:05.368 0810 [BERLIN_PSL -] Starting to send messages on output channel 1001, queue 7",
logstashdebug     |           "host" => {
logstashdebug     |         "name" => "BERLIN40001"
logstashdebug     |     },
logstashdebug     |       "@version" => "1",
logstashdebug     |         "offset" => 0,
logstashdebug     |       "datetime" => "%{date} %{clock}",
logstashdebug     |           "tags" => [
logstashdebug     |         [0] "SENDING",
logstashdebug     |         [1] "beats_input_codec_plain_applied",
logstashdebug     |         [2] "_grokparsefailure",
logstashdebug     |         [3] "_dateparsefailure"
logstashdebug     |     ]
logstashdebug     | }

So as I read it, the grok doesn't place any values in the fields?

You need 2 spaces at the start, not one. Or perhaps "^%{SPACE}%{NOTSPACE:date}%{SPACE}

1 Like

Doooh, I've been staring myself completely blind on this :laughing:

It works like a charm now WOW!

Now, I'll test the other lines, and then I guess I add a drop for the _grokparsefailure for all the uninteresting lines.

Ok, so I got it to intepret the 4 lines as it should.

filter {
  if "portraitsending" in [fields][logtype] {
     grok {
       match => {
          "message" => [ "^  %{NOTSPACE:date}%{SPACE}%{NOTSPACE:clock} %{WORD:taskid} \[%{WORD:instance} -] %{WORD:status} %{GREEDYDATA:restOfLine}"]
       }
     }
     grok {
       match => {
        "restOfLine" => [
          "to send %{GREEDYDATA}",
          "%{INT:batchsent} messages %{GREEDYDATA}",
          "DialogServer.SendMessages last %{INT:batchduration} seconds$",
          "sending %{GREEDYDATA}"
          ]
        }
     }

     mutate {
       add_field => {
         "datetime" => "%{date} %{clock}"
       }
       remove_field => [ "date", "clock", "restOfLine" ]
       rename => { "instance" => "Instance name" }
     }
     date {
       match => [ "datetime", "yyyy-MM-dd H:mm:ss.SSS",  "yyyy-MM-dd HH:mm:ss.SSS"]
     }

  }
}

Now that I have the four lines, I have to look into aggregate plugin, that looks complicated. Not sure how to debug it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.