[bug] Syslog RFC 5424 parser does not properly handle escaped characters in structured data

Hello,

I'm trying to collect logs from different appliances by using an Elastic Agent 8.14.3 with the Custom UDP Logs integration 1.19.1.

One of the appliance is sending RFC 5424 formatted logs, with a structured data part and a regular part.

The structured data has values containing escaped "]" and this seems to trigger a bug in the parser if the escaped "]" is not the last character of a value.

For example, the following message is properly parsed:

<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011" somekey="[value\]"][examplePriority@32473 class="high"] Some message

and the extracted structured data is as expected:

[exampleSDID@32473 iut="3" eventSource="Application" eventID="1011" somekey="[value\]"][examplePriority@32473 class="high"]

But this message is not:

<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011" somekey="[value\] more data"][examplePriority@32473 class="high"] Some message

for which the extracted structured data is:

more data"][examplePriority@32473 class="high"]

Note: The messages above can be directly tested from the test unit for the rfc5425 parser of the syslog processor in libbeat

https://github.com/elastic/beats/blob/main/libbeat/reader/syslog/rfc5424_test.go

Hi @pcollardez,

Thank you for this report. This is indeed a bug with the parser. I've written a public issue for it over on Github: Syslog reader/processor does not handle escaped brackets in structured data fields · Issue #40445 · elastic/beats · GitHub

There is also a bug if the regular message contains a "]":

<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011" somekey="value"][examplePriority@32473 class="high"] Some message [value] more data

In this case, the extracted "structured" data is:

Some message [value]

and the regular message:

more data

instead of the expected ones.

On another level, it seems that the UTF-8 BOM (\xEFBBBF) is not processed if present at the beginning of the regular message:

<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011" somekey="value"][examplePriority@32473 class="high"] \xEFBBBFSome message

but that would only forward the BOM in the value of the field, which is not really a bug or a real problem as long as there is no decoding attempt.

I've got a fix in place for ']' in the regular message as well.

Regarding the BOM, is that even the correct escape sequence? To me that just looks like 'EF' hex, followed by a regular "BBBF". We have tests in place already for the BOM and I can't reproduce the issue. We use " \ufeff" for the escape sequence in tests and the live code. This is also what is used by Go: text/encoding/unicode/override.go at master · golang/text · GitHub

EDIT: For a hex escape sequence, I would expect to see something like this: " \xEF\xBB\xBF". I tested with this in our unit tests and it is not included in the final message as expected.

Thank you for the fixes about the brackets.

About the BOM, you're right, I made a mistake and you got it : I meant \xEF\xBB\xBF.