Logstash-filter-verifier vs grok debugger differences

Hi I am using logstash 5.2 and the latest logstash-filter-verifier and am noticing a difference between the output of the logstash-filter-verifier and what I submitted to the grok debugger.

Grok Debugger input:
"2017-03-29 15:48:48.872 8698 WARNING oslo.service.loopingcall [req-aa38d6b9-c72a-4fbe-b8eb-3340ca9cf353 - - - - -] Function 'neutron.agent.dhcp.agent.DhcpAgentWithStateReport._report_state' run outlasted interval by 0.01 sec"

Grok Debugger output:
https://pastebin.com/s8jhpF1k

Grok Debugger filter:
"%{TIMESTAMP_ISO8601:[@metadata][timestamp]}%{SPACE}%{NUMBER:pid}%{SPACE}?%{LOGLEVEL:loglevel} \[?\b%{NOTSPACE:module}\b\]?%{SPACE}? (?:\[\-\]|\[(?:(%{NOTSPACE:request_id}|\-) (%{NOTSPACE:user}|\-) (%{NOTSPACE:tenant}|\-) (%{NOTSPACE:domain}|\-) (%{NOTSPACE:user_domain}|\-) (%{NOTSPACE:project_domain}|\-)\]))?%{SPACE}?(?:%{IP:instance}|(?<instance>\h{8}-\h{4}-\h{4}-\h{4}-\h{12}))?%{SPACE}?%{DATA:logmessage}%{SPACE}?(?<msg_data>%{NOTSPACE}=%{GREEDYDATA})?"

Logstash-filter-verifier input line:
"2017-03-29 15:48:48.872 8698 WARNING oslo.service.loopingcall [req-aa38d6b9-c72a-4fbe-b8eb-3340ca9cf353 - - - - -] Function 'neutron.agent.dhcp.agent.DhcpAgentWithStateReport._report_state' run outlasted interval by 0.01 sec",

Logstash-filter-verifier conf:
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:[@metadata][timestamp]}%{SPACE}%{NUMBER:pid}%{SPACE}?%{LOGLEVEL:loglevel} [?\b%{NOTSPACE:module}\b]?%{SPACE}? (?:[-]|[(?:(%{NOTSPACE:request_id}|-) (%{NOTSPACE:user}|-) (%{NOTSPACE:tenant}|-) (%{NOTSPACE:domain}|-) (%{NOTSPACE:user_domain}|-) (%{NOTSPACE:project_domain}|-)]))?%{SPACE}?(?:%{IP:instance}|(?\h{8}-\h{4}-\h{4}-\h{4}-\h{12}))?%{SPACE}?%{DATA:logmessage}%{SPACE}?(?<msg_data>%{NOTSPACE}=%{GREEDYDATA})?" }
add_field => { "received_at" => "%{@timestamp}" }
}
kv {
source => "msg_data"
field_split => " ,"
}
date { match => [ "[@metadata][timestamp]", "YYYY-MM-DD HH:mm:ss.sss", "ISO8601"] }
if [module] == "iso8601.iso8601" {
drop {}
}
}

Logstash-filter-verifier output:
Comparing message 2 of 100 from neutron.json...
--- /tmp/473250745/neutron.json/2/expected 2017-07-24 19:40:50.065109719 +0000
+++ /tmp/473250745/neutron.json/2/actual 2017-07-24 19:40:50.065109719 +0000
@@ -1,3 +1,15 @@
{
- "host": "cfa"
+ "@timestamp": "2017-03-29T15:48:48.872Z",
+ "domain": "-",
+ "host": "cfa",
+ "loglevel": "WARNING",
+ "message": "2017-03-29 15:48:48.872 8698 WARNING oslo.service.loopingcall [req-aa38d6b9-c72a-4fbe-b8eb-3340ca9cf353 - - - - -] Function 'neutron.agent.dhcp.agent.DhcpAgentWithStateReport._report_state' run outlasted interval by 0.01 sec",
+ "module": "oslo.service.loopingcall",
+ "pid": "8698",
+ "project_domain": "-",
+ "received_at": "2017-07-24T19:32:08.853Z",
+ "request_id": "req-aa38d6b9-c72a-4fbe-b8eb-3340ca9cf353",
+ "tenant": "-",
+ "user": "-",
+ "user_domain": "-"
}
I was expecting the logstash-filter-verifier to be the same as the grok debugger output - specifically that the logmessage would have been captured as:
"logmessage": [
[
"Function 'neutron.agent.dhcp.agent.DhcpAgentWithStateReport._report_state' run outlasted interval by 0.01 sec"
]
],

What am I missing, or should it be the same?

I see I copied the wrong grok filter for the LFV.conf- it should have had(?<msg_data>%{NOTSPACE}=%{GREEDYDATA}) in it which matches the grok debugger string.
I think I found out that the difference is that the grok debugger includes "$" as a valid endpoint for %{DATA} where as logstash %{DATA} filter, which is identified as the same ".*?", does not include "$" as a valid end point (for the "?").
By reordering the msg_data string and changing the %{DATA:logmessage} to(?<logmessage>[A-Za-z0-9' .:_-&&[^=]]+)) I think I have a workaround that works.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.