Filebeat Multiline Parsing with Regex flushing lines with ^M

Hello Community,

We have setup filebeat for xml like log ingestion in our on-prem environments using the following configuration in filebeat.yml

- type: filestream
  enabled: true
  paths:
    - /cognos/*/logs/XQE/*.xml  # Apply multiline to this file only
  parsers:
  - multiline:
      type: pattern
      pattern: '^<event '
      negate: true
      match: after
      skip_newline: true

We assumed it was working perfectly since we are using as begging of the line <event and not closing with since in some cases, we may find nested events...

Everything was fine after our initial testing but now we have found missing logs that seams to be related with special chars found in the log itself...
Sometimes when the log contains the char ^M on it, it flushes the full line and goes to the next one resulting in the missing logs...

Here's an example of the log:

<event component="XQE" group="JDBC" level="INFO" thread="6826" timestamp="2025-03-03 22:27:30.146" contextId="123124" requestId="XXXXXXXXXXXXX:XXXXXXXXXXXXX" sessionId="XXXXXXXXXXXXX"><![CDATA[[73464] [Start]Statement.execute(SELECT CODEPAGE.VALUE || '.' || COLLNAME.VALUE, CASE WHEN 'A' = 'a' and 'é' = 'e' THEN 'CI_AI' WHEN 'A' = 'a' and 'é' <> 'e' THEN 'CI_AS' WHEN 'A' <> 'a' and 'é' <> 'e' THEN 'CS_AS' ELSE 'CS_AI' END as COLLATOR_STRENGTH FROM (SELECT VALUE FROM SYSIBMADM.DBCFG WHERE NAME = 'codepage') CODEPAGE, (SELECT VALUE FROM SYSIBMADM.DBCFG WHERE NAME = 'db_collname') COLLNAME)]]></event>
<event component="XQE" group="JDBC" level="INFO" thread="6826" timestamp="2025-03-03 22:27:30.242" contextId="123124" requestId="XXXXXXXXXXXXX:XXXXXXXXXXXXX" sessionId="XXXXXXXXXXXXX"><![CDATA[[73469] [Start]Statement.execute(/* user=SS TEST User reportPath= queryName= REMOTE_ADDR= SERVER_NAME= requestID=XXXXXXXXXXXXX:XXXXXXXXXXXXX */ ^M

The first one is capture correctly, the second is flushed and never ingested.

1 Like