Grok Patterns not execute properly on logstash filter

(Tharaka) #1

I have following log which has to be normalized using grok patterns.

<46>Sep 29 12:10:36 SXX-XX-XO SFIMS: [133:51:1] dcerpc2: SMB - Outstanding requests with the same MID [Impact: Currently Not Vulnerable] From \"XXX-XO-02\" at Fri Sep 29 12:10:34 2017 UTC [Classification: Potentially Bad Traffic] [Priority: 2] {tcp} (unknown)-> (unknown)

I wrote following grok to filter above log in logstash

filter {

grok {
	match => [ "message", "<%{POSINT:pri_id}>%{SYSLOGTIMESTAMP:log_timestamp} %{HOSTNAME:hostname} %{WORD:source}: \[%{DATA:num}\] %{GREEDYDATA:signature} \[Impact: %{DATA:impact}\] From \\"%{DATA:device}\\" %{WORD:seq} %{WORD:day} %{SYSLOGTIMESTAMP:trigger_timestamp} %{DATA:list_year} %{WORD:time_zone} \[Classification: %{GREEDYDATA:classification}\] \[Priority: %{NUMBER:priority}\] \{%{DATA:protocol}\} (?<srcip>[0-9]+.[0-9]+.[0-9]+.[0-9]+|N/A):(?<srcport>[0-9]+|N/A) \(%{DATA:srcname}\)->(?<dstip>[0-9]+.[0-9]+.[0-9]+.[0-9]+|N/A):(?<dstport>[0-9]+|N/A) \(%{DATA:dstname}\)"]

mutate {
	remove_field => [ "pri_id", "num", "seq", "day", "time_zone, "list_year" ]

But when I run this .conf file it return following error

Sending Logstash's logs to /etc/logstash-5.2.2/logs which is now configured via
[2017-10-02T12:14:35,554][ERROR][logstash.agent           ] Cannot load an invalid configuration {:reason=>"Expected one of #, {, ,, ] at line 14, colum     n 65 (byte 727) after filter {\r\n\r\n\tgrok {\r\n\t\tmatch => [ \"message\", \"<%{POSINT:pri_id}>%{SYSLOGTIMESTAMP:log_timestamp} %{HOSTNAME:hostname}      %{WORD:source}: \\[%{DATA:num}\\] %{GREEDYDATA:signature} \\[Impact: %{DATA:impact}\\] From \\\\\"%{DATA:device}\\\\\" %{WORD:seq} %{WORD:day} %{SYSLOGT     IMESTAMP:trigger_timestamp} %{DATA:list_year} %{WORD:time_zone} \\[Classification: %{GREEDYDATA:classification}\\] \\[Priority: %{NUMBER:priority}\\] \\     {%{DATA:protocol}\\} (?<srcip>[0-9]+.[0-9]+.[0-9]+.[0-9]+|N/A):(?<srcport>[0-9]+|N/A) \\(%{DATA:srcname}\\)->(?<dstip>[0-9]+.[0-9]+.[0-9]+.[0-9]+|N/A):(     ?<dstport>[0-9]+|N/A) \\(%{DATA:dstname}\\)\"]\r\n\t}\t\r\n\r\n\tmutate {\r\n\t\tremove_field => [ \"pri_id\", \"num\", \"seq\", \"day\", \"time_zone, \     ""}

But when I run above grok in grok pattern successfully executed and display the results

Need help to sort out the issue.

(Paris Mermigkas) #2

The error log shows where exactly the configuration breaks. You forgot an enclosing quote on the time_zone field in

mutate {
	remove_field => [ "pri_id", "num", "seq", "day", "time_zone, "list_year" ]

(Tharaka) #3

Sorry that was my mistake and I have corrected enclosing quate on time_zone. But after I have correct it gives "_grokparsefailure" as below

"@timestamp" => 2017-10-02T09:39:09.196Z,
  "@version" => "1",
      "host" => "",
   "message" => "<46>Oct  2 09:38:29 SXX-XX-XO SFIMS: [1:34463:3] \"APP-DETECT TeamViewer remote administration tool outbound connection attempt\" [Impact: Potentially Vulnerable] From \"XXX-XO-02\" at Mon Oct  2 09:38:28 2017 UTC [Classification: Potential Corporate Policy Violation] [Priority: 1] {tcp} (unknown)-> (unknown)",
      "tags" => [
    [0] "_grokparsefailure"

(Magnus Bäck) #4

Start with the simplest possible expression (<%{POSINT:pri_id}>) and make sure that works. Continue to add more and more until you've found where it breaks.

(Tharaka) #5


As you said I have start to build grok patterns one by one. And fully tested in with sample logs and it was correctly extracted all tested data

But when I run same grok patterns in real environment it gives grok failure

(Magnus Bäck) #6

Start with the simplest possible expression in Logstash and build from there. Testing expressions in the grok debugger is useful but if you want it to work in Logstash that's where you should test things.

Now, having a quick look at your grok expression you use multiple GREEDYDATA and DATA patterns. This is a really really bad idea and even if it isn't the cause for your current issue it will result in poor performance. Any grok expression with more than one GREEDYDATA or DATA is very likely inefficient.

(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.