How to extract the entire value of a complicated field?

HI guys,

I'm trying to create a logstash pipeline that parses incoming CEF logs, apply some logic and then outputs the log in JSON format to the console. Some logs are a bit complicated to parse since the key=value pairs cannot always be recognized by a simple delimiter.
Take the following log for example: (This is parsed by a grok filter into default CEF keys and extensions)

"message" => "<14>Apr 27 00:38:45 paloaltovm CEF:0|PaloAltoNetworks|PAN-OS|11.0.1|Succeeded|CONFIG|1|rt=Apr 26 2023 22:38:44 GMT deviceExternalId=63CCFFBBE1C0D07 shost=82.217.2.18 cs3= act=set duser=PaloAlto destinationServiceName=Web msg= vsys  vsys1 rulebase security rules  blabla externalId=7226424316814950417 PanOSDGl1=0 PanOSDGl2=0 PanOSDGl3=0 PanOSDGl4=0 PanOSVsysName= dvchost=paloaltovm PanOSActionFlags=0x0 cs1Label=\"Before Change Detail\" cs1={} cs2Label=\"After Change Detail\" cs2={blabla 55c6e493-a011-4556-ac9e-6f5318913d3d { to [ any ]; from [ any ]; source [ testadress ]; destination [ any ]; source-user [ any ]; category [ any ]; application [ any ]; service [ application-default ]; source-hip [ any ]; destination-hip [ any ]; tag [ tag1 ]; action allow; rule-type universal; description yoyo!; } } PanOSFWDeviceGroup=0 PanOSPolicyAuditComment=auditcommenting"

Some fields are easy to parse such as "PanOSDGl1" and "dvchost". But I would love to be able to give the field "cs2" the value:

"{blabla 55c6e493-a011-4556-ac9e-6f5318913d3d { to [ any ]; from [ any ]; source [ testadress ]; destination [ any ]; source-user [ any ]; category [ any ]; application [ any ]; service [ application-default ]; source-hip [ any ]; destination-hip [ any ]; tag [ tag1 ]; action allow; rule-type universal; description yoyo!; } }"

and field "msg" the value:

vsys vsys1 rulebase security rules blabla

I don't know a good way to handle this. I can configure the firewall (which sends the logs to Logstash) to place the value of the field "msg" in "{}", but I don't know if this will help since all other fields are parsed with the filter:

 kv {
    field_split => " "
    value_split => "="

Why not use a cef codec rather than trying to parse CEF using a kv filter? You can use a tcp output/input pair as shown here.

I have tried that but the only output that was generated is this:

CEF:0|Elasticsearch|Logstash|1.0|Logstash|Logstash|6|

That is what you would get if you used a cef codec on an output, which is not what I linked to.

Ah I see what you mean now.

It does help with the parsing of the complicated fields since it now parses the whole message.

However, it does generate multiple _grokparssingerror messages and some fields are now missing. (such as the "message" field which showed the raw log message). Also, the structure of the output seems a bit off. I checked the result after removing the grok filter, but that broke my logic filters since a lot of the if statements look at the name of fields.

Are these logs from a Palo Alto device, right? Are you able to change the format of the logs that are being sent?

If you can, I would recommend that you change it to send logs in csv, it will make it pretty easy to parse the logs.

I needed to receive logs from Palo Alto devices a couple of time ago and the first thing I asked the network team was to send the logs as csv not CEF.

Thanks for the advice! I made it work by editing the palo alto log format by not using a space as delimiter for the fields but a ";". Then using the kv filter plugin with field_split => ";" to mark the complete values.

Now the full value of each field is correctly parsed.