Parsing a varying length line with custom grok filter, only returns the first field

anar · November 26, 2019, 2:33pm

Hello,

I'm trying to set up a custom grok filter for my data input, but when I test it in Kibana's Grok Debugger, I only get the value for the first field (field1). I'm using grok instead of the csv parser because after field7 the last data field is varying length, and it should just be a single entry in Logstash (I will do some post processing on it afterwards).

My data looks like this:

91877900$|$11613428$|$DEVICE$|$CUSTOM-DEVICE1$|$UTC+02:00$|$["13","19","24","53","60","61","65","66","67","8","1"]$|$title=News$|$genre=News Broadcast$|$startTime=1574190000000$|$programId=659107083$|

My grok pattern looks like this:

%{INT:field1}\$|$ %{INT:field2}\$|$ %{WORD:field3}\$|$ %{DATA:field4}\$|$ %{DATA:field5}\$|$ %{TZ:field6}\$|$ %{GREEDYDATA:field7}\$|$ %{GREEDYDATA:theRestOfIt}

Can someone help with this? I'm getting stuck on this part, and I don't understand why my output looks like this:

{
  "field1": "91877900"
}

Badger · November 26, 2019, 2:45pm

You need to escape all of the $ and | with \

| is used for alternation -- foo|bar matches either foo or bar, so your pattern match any one of

%{INT:field1}\$
$ %{INT:field2}\$
$ %{WORD:field3}\$$
etc.

So once if matches the first INT it does not check the rest of the patterns.

anar · November 26, 2019, 2:55pm

Thank you for the quick reply, Badger.

I tried escaping the $ and | with \, but if I run
%{INT:field1}\$ \|\$%{INT:field2}\$
or
%{INT:field1}\$ \$%{INT:field2}\$
on my input, I get a "Provided Grok patterns do not match data in the input" error.

I also get the same error if I try:
%{INT:field1}\$\|\$ %{INT:field2}\$\|\$ %{WORD:field3}\$\|\$ %{DATA:field4}\$\|\$ %{DATA:field5}\$\|\$ %{TZ:field6}\$\|\$ %{GREEDYDATA:field7}\$\|\$ %{GREEDYDATA:theRestOfIt}

Am I missing something?

Badger · November 26, 2019, 4:32pm

Remove all the spaces and replace TZ with DATA.

input { generator { count => 1 lines => [ '91877900$|$11613428$|$DEVICE$|$CUSTOM-DEVICE1$|$UTC+02:00$|$["13","19","24","53","60","61","65","66","67","8","1"]$|$title=News$|$genre=News Broadcast$|$startTime=1574190000000$|$programId=659107083$|' ] } }
filter {
    grok { match => { "message" => "%{INT:field1}\$\|\$%{INT:field2}\$\|\$%{WORD:field3}\$\|\$%{DATA:field4}\$\|\$%{DATA:field5}\$\|\$%{DATA:field6}\$\|\$%{GREEDYDATA:field7}\$\|\$%{GREEDYDATA:theRestOfIt}" } }
}
output { stdout { codec => rubydebug { metadata => false } } }

produces

     "field6" => "[\"13\",\"19\",\"24\",\"53\",\"60\",\"61\",\"65\",\"66\",\"67\",\"8\",\"1\"]",
     "field1" => "91877900",
"theRestOfIt" => "programId=659107083$|",
     "field7" => "title=News$|$genre=News Broadcast$|$startTime=1574190000000",

etc.

anar · November 27, 2019, 11:52am

Thanks a lot for the help, that solved it!

system · December 25, 2019, 11:53am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parser for multiple line file Logstash	6	645	July 6, 2017
Grok not breaking on matching first pattern Logstash	1	370	October 15, 2018
Grok multiline parse failures Logstash	7	4829	November 17, 2017
Grok filter throwing error for valid pattern Logstash	5	867	April 22, 2018
Logstash Grok Filter breaks Single log line as 2 lines Logstash	2	573	October 25, 2017

Parsing a varying length line with custom grok filter, only returns the first field

Related topics