Grok to parse CEF extension fields

leandrojmp · January 5, 2018, 7:04pm

Hello,

I'm trying to create a grok pattern to parse the extension fields in CEF message from an antivirus server.

My problem right now is that the same field can have different types of data, sometimes it is an intenger, other times it is a word, another time it could be a message or a version with major and minor numbers.

And also sometimes I do not have all the fields, but I can use ( )? to make the field optional.

Something like that:

cs2=KES cs2Label=ProductName cs3=10.2.4.0 cs3Label=ProductVersion cs5=Install update - Service Pack 1 MR 2 cs5Label=TaskName cs4=159 cs4Label=TaskId cn2=4 cn2Label=TaskNewState cn1=1 cn1Label=TaskOldState

 cs2=1093 cs2Label=ProductName cs3=1.0.0.0 cs3Label=ProductVersion

If for example I use (cs2=%{WORD:cs2.id})? it will match the first line for the field cs2, but not the second, if I use INT instead of WORD it will match the second line for cs2, If i use DATA nothing is matched and if I use GREEDYDATA, all the message will be in the first field that appears in the message, in this case cs2.

Anyone has any idea how to solve this parsing problem?

For what I saw in the logs the values in the fields can be an intenger, a word, a software version, a message with spaces, a filename, and trying greedydata does not work, since it ignores all the other fields that come after the match.

I'm trying the following pattern for this part of the message, but it is not working:
(it's all in one line, the line breaking is only to better visualization)

(%{SPACE})?(cs1=%{INT:cs1.id})?%{SPACE}(cs1Label=%{WORD:cs1.label})?
(%{SPACE})?(cs2=%{INT:cs2.id})?%{SPACE}(cs2Label=%{WORD:cs2.label})?
(%{SPACE})?(cs3=%{INT:cs3.id})?%{SPACE}(cs3Label=%{WORD:cs3.label})?
(%{SPACE})?(cs4=%{INT:cs4.id})?%{SPACE}(cs4Label=%{WORD:cs4.label})?
(%{SPACE})?(cs5=%{INT:cs5.id})?%{SPACE}(cs5Label=%{WORD:cs5.label})?
(%{SPACE})?(cs6=%{INT:cs6.id})?%{SPACE}(cs6Label=%{WORD:cs6.label})?
(%{SPACE})?(cn1=%{INT:cn1.id})?%{SPACE}(cn1Label=%{WORD:cn1.label})?
(%{SPACE})?(cn2=%{INT:cn2.id})?%{SPACE}(cn2Label=%{WORD:cn2.label})?
(%{SPACE})?(cn3=%{INT:cn3.id})?%{SPACE}(cn3Label=%{WORD:cn3.label})?
(%{SPACE})?(cn4=%{INT:cn4.id})?%{SPACE}(cn4Label=%{WORD:cn4.label})?
(%{SPACE})?(cn5=%{INT:cn5.id})?%{SPACE}(cn5Label=%{WORD:cn5.label})?
(%{SPACE})?(cn6=%{INT:cn6.id})?%{SPACE}(cn6Label=%{WORD:cn6.label})?

magnusbaeck · January 7, 2018, 6:24pm

Use a kv filter, not grok.

leandrojmp · January 8, 2018, 5:51pm

Oh, thanks!

kv helped a lot, I'm using grok to parse the beginning of the message and the rest I'm using kv, but I'm still having some problems.

How can I keep the spaces in a value since space is also the field separator?

For example:

cs5=Install update - Service Pack 2 cs5Label=TaskName cs4=102 cs4Label=TaskId cn2=1 cn2Label=TaskNewState cn1=0 cn1Label=TaskOldState

Using kv will give me the value for the cs5 key as only 'Install', but I need the full message, which should be 'Install update - Service Pack 2'

Is there any way to do it using kv? Or I will need to go back to grok and grok each kind of message?

Badger · January 8, 2018, 7:42pm

I think you would have to resort to ruby code to parse arbitrary CEF extensions. Basically you would need to step through the extension string one (non-escaped) = at a time. Within the text between two =, work backwards from the end to find the last space, which separates the value from the next keyword.

leandrojmp · January 8, 2018, 8:06pm

Hello,

I solved the problem using a combination of grok and kv.

Since only one of the keys have space in the value, I solved the problem using grok.

The messages are something like the one below:

Jan  5 11:26:21 server.hostname CEF: 0|KasperskyLab|SecurityCenter|10.4.343|KLPRCI_TaskState|Running|1|rt=1515158609 dhost=a-hostname dst=an-ip-address cs2=KES cs2Label=ProductName cs3=10.3.0.0 cs3Label=ProductVersion cs5=Install update - Service Pack 2 cs5Label=TaskName cs4=102 cs4Label=TaskId cn2=1 cn2Label=TaskNewState cn1=0 cn1Label=TaskOldState

I use grok to parse the first part of the message:

Jan  5 11:26:21 server.hostname CEF: 0|KasperskyLab|SecurityCenter|10.4.343|KLPRCI_TaskState|Running|1|

And then I'm using GREEDYDATA to match the remaining into a field, which I use as the source for kv and also as the source for another grok, to match only the field with spaces in the value.

I will look into ruby to see if it is something simple, but right now it is working this way.

GlennH · January 12, 2018, 8:04pm

Can you share the contents of your conf. I am looking to do the same.

Thanks,
Glenn

system · February 9, 2018, 8:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pattern to extract specific integer and string Logstash	7	479	September 1, 2022
Parsing message field from CEF logs SIEM	5	1569	April 5, 2022
How to extract the entire value of a complicated field? Logstash	7	447	May 29, 2023
Need assistance on CEF _grokparsefailure Logstash	8	597	August 16, 2021
Filtering syslog message Logstash	5	619	August 25, 2021

Grok to parse CEF extension fields

Related topics