Grok Patterns not execute properly on logstash filter

I have following log which has to be normalized using grok patterns.

<46>Sep 29 12:10:36 SXX-XX-XO SFIMS: [133:51:1] dcerpc2: SMB - Outstanding requests with the same MID [Impact: Currently Not Vulnerable] From \"XXX-XO-02\" at Fri Sep 29 12:10:34 2017 UTC [Classification: Potentially Bad Traffic] [Priority: 2] {tcp} 192.168.1.88:55422 (unknown)->172.2.2.1:445 (unknown)

I wrote following grok to filter above log in logstash

filter {

grok {
	match => [ "message", "<%{POSINT:pri_id}>%{SYSLOGTIMESTAMP:log_timestamp} %{HOSTNAME:hostname} %{WORD:source}: \[%{DATA:num}\] %{GREEDYDATA:signature} \[Impact: %{DATA:impact}\] From \\"%{DATA:device}\\" %{WORD:seq} %{WORD:day} %{SYSLOGTIMESTAMP:trigger_timestamp} %{DATA:list_year} %{WORD:time_zone} \[Classification: %{GREEDYDATA:classification}\] \[Priority: %{NUMBER:priority}\] \{%{DATA:protocol}\} (?<srcip>[0-9]+.[0-9]+.[0-9]+.[0-9]+|N/A):(?<srcport>[0-9]+|N/A) \(%{DATA:srcname}\)->(?<dstip>[0-9]+.[0-9]+.[0-9]+.[0-9]+|N/A):(?<dstport>[0-9]+|N/A) \(%{DATA:dstname}\)"]
}	

mutate {
	remove_field => [ "pri_id", "num", "seq", "day", "time_zone, "list_year" ]
  }
}

But when I run this .conf file it return following error

Sending Logstash's logs to /etc/logstash-5.2.2/logs which is now configured via log4j2.properties
[2017-10-02T12:14:35,554][ERROR][logstash.agent           ] Cannot load an invalid configuration {:reason=>"Expected one of #, {, ,, ] at line 14, colum     n 65 (byte 727) after filter {\r\n\r\n\tgrok {\r\n\t\tmatch => [ \"message\", \"<%{POSINT:pri_id}>%{SYSLOGTIMESTAMP:log_timestamp} %{HOSTNAME:hostname}      %{WORD:source}: \\[%{DATA:num}\\] %{GREEDYDATA:signature} \\[Impact: %{DATA:impact}\\] From \\\\\"%{DATA:device}\\\\\" %{WORD:seq} %{WORD:day} %{SYSLOGT     IMESTAMP:trigger_timestamp} %{DATA:list_year} %{WORD:time_zone} \\[Classification: %{GREEDYDATA:classification}\\] \\[Priority: %{NUMBER:priority}\\] \\     {%{DATA:protocol}\\} (?<srcip>[0-9]+.[0-9]+.[0-9]+.[0-9]+|N/A):(?<srcport>[0-9]+|N/A) \\(%{DATA:srcname}\\)->(?<dstip>[0-9]+.[0-9]+.[0-9]+.[0-9]+|N/A):(     ?<dstport>[0-9]+|N/A) \\(%{DATA:dstname}\\)\"]\r\n\t}\t\r\n\r\n\tmutate {\r\n\t\tremove_field => [ \"pri_id\", \"num\", \"seq\", \"day\", \"time_zone, \     ""}

But when I run above grok in https://grokdebug.herokuapp.com/ grok pattern successfully executed and display the results

Need help to sort out the issue.

The error log shows where exactly the configuration breaks. You forgot an enclosing quote on the time_zone field in

mutate {
	remove_field => [ "pri_id", "num", "seq", "day", "time_zone, "list_year" ]
  }

Sorry that was my mistake and I have corrected enclosing quate on time_zone. But after I have correct it gives "_grokparsefailure" as below

{
"@timestamp" => 2017-10-02T09:39:09.196Z,
  "@version" => "1",
      "host" => "192.168.50.15",
   "message" => "<46>Oct  2 09:38:29 SXX-XX-XO SFIMS: [1:34463:3] \"APP-DETECT TeamViewer remote administration tool outbound connection attempt\" [Impact: Potentially Vulnerable] From \"XXX-XO-02\" at Mon Oct  2 09:38:28 2017 UTC [Classification: Potential Corporate Policy Violation] [Priority: 1] {tcp} 192.168.1.111:51523 (unknown)->192.168.1.80:8080 (unknown)",
      "tags" => [
    [0] "_grokparsefailure"
]
}

Start with the simplest possible expression (<%{POSINT:pri_id}>) and make sure that works. Continue to add more and more until you've found where it breaks.

@magnusbaeck,

As you said I have start to build grok patterns one by one. And fully tested in https://grokdebug.herokuapp.com/ with sample logs and it was correctly extracted all tested data

But when I run same grok patterns in real environment it gives grok failure

Start with the simplest possible expression in Logstash and build from there. Testing expressions in the grok debugger is useful but if you want it to work in Logstash that's where you should test things.

Now, having a quick look at your grok expression you use multiple GREEDYDATA and DATA patterns. This is a really really bad idea and even if it isn't the cause for your current issue it will result in poor performance. Any grok expression with more than one GREEDYDATA or DATA is very likely inefficient.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.