How to extract the entire value of a complicated field?

HI guys,

I'm trying to create a logstash pipeline that parses incoming CEF logs, apply some logic and then outputs the log in JSON format to the console. Some logs are a bit complicated to parse since the key=value pairs cannot always be recognized by a simple delimiter.
Take the following log for example: (This is parsed by a grok filter into default CEF keys and extensions)

"message" => "<14>Apr 27 00:38:45 paloaltovm CEF:0|PaloAltoNetworks|PAN-OS|11.0.1|Succeeded|CONFIG|1|rt=Apr 26 2023 22:38:44 GMT deviceExternalId=63CCFFBBE1C0D07 shost=82.217.2.18 cs3= act=set duser=PaloAlto destinationServiceName=Web msg= vsys  vsys1 rulebase security rules  blabla externalId=7226424316814950417 PanOSDGl1=0 PanOSDGl2=0 PanOSDGl3=0 PanOSDGl4=0 PanOSVsysName= dvchost=paloaltovm PanOSActionFlags=0x0 cs1Label=\"Before Change Detail\" cs1={} cs2Label=\"After Change Detail\" cs2={blabla 55c6e493-a011-4556-ac9e-6f5318913d3d { to [ any ]; from [ any ]; source [ testadress ]; destination [ any ]; source-user [ any ]; category [ any ]; application [ any ]; service [ application-default ]; source-hip [ any ]; destination-hip [ any ]; tag [ tag1 ]; action allow; rule-type universal; description yoyo!; } } PanOSFWDeviceGroup=0 PanOSPolicyAuditComment=auditcommenting"

Some fields are easy to parse such as "PanOSDGl1" and "dvchost". But I would love to be able to give the field "cs2" the value:

"{blabla 55c6e493-a011-4556-ac9e-6f5318913d3d { to [ any ]; from [ any ]; source [ testadress ]; destination [ any ]; source-user [ any ]; category [ any ]; application [ any ]; service [ application-default ]; source-hip [ any ]; destination-hip [ any ]; tag [ tag1 ]; action allow; rule-type universal; description yoyo!; } }"

and field "msg" the value:

vsys vsys1 rulebase security rules blabla

I don't know a good way to handle this. I can configure the firewall (which sends the logs to Logstash) to place the value of the field "msg" in "{}", but I don't know if this will help since all other fields are parsed with the filter:

 kv {
    field_split => " "
    value_split => "="

Why not use a cef codec rather than trying to parse CEF using a kv filter? You can use a tcp output/input pair as shown here.

I have tried that but the only output that was generated is this:

CEF:0|Elasticsearch|Logstash|1.0|Logstash|Logstash|6|

That is what you would get if you used a cef codec on an output, which is not what I linked to.

Ah I see what you mean now.

It does help with the parsing of the complicated fields since it now parses the whole message.

However, it does generate multiple _grokparssingerror messages and some fields are now missing. (such as the "message" field which showed the raw log message). Also, the structure of the output seems a bit off. I checked the result after removing the grok filter, but that broke my logic filters since a lot of the if statements look at the name of fields.

Are these logs from a Palo Alto device, right? Are you able to change the format of the logs that are being sent?

If you can, I would recommend that you change it to send logs in csv, it will make it pretty easy to parse the logs.

I needed to receive logs from Palo Alto devices a couple of time ago and the first thing I asked the network team was to send the logs as csv not CEF.

Thanks for the advice! I made it work by editing the palo alto log format by not using a space as delimiter for the fields but a ";". Then using the kv filter plugin with field_split => ";" to mark the complete values.

Now the full value of each field is correctly parsed.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.