Problem multiline pattern matching

Hi guys !

I defer to you because I encounter a problem concerning the configuration of the pattern to aggregate several lines of logs on the same document.

Here is an excerpt from my log:

2023-08-17 16:13:15.389 |CB RCPD| [INFO] com.payxpert.cbrcpd.cb2a.certified.parameterdownloading.DialogContext:91 - [/] - f5a1f043-b97f-4ef8-977d-3bcec91d2ab2 Outgoing: <?xml version="1.0" encoding="UTF-8" standalone="no"?><isomsg mti="0854">
    <field id="11" value="075502"/>
    <field id="24" value="860"/>
    <field id="39" value="0000"/>
    <field id="46">
        <subfield id="df54" value="31"/>
    </field>
</isomsg>
2023-08-17 16:13:15.391 |CB RCPD| [INFO] com.payxpert.cbrcpd.cbcom.certified.TlcTlpCBCOMChannel:91 - [/] - Outgoing: 0854002001000204000007550208603030303005DF54000131
2023-08-17 16:13:15.392 |CB RCPD| [INFO] com.payxpert.cbrcpd.cb2a.certified.parameterdownloading.Cb2aStateMachineRunner:105 - [/] - State transition: AcceptCloseDialogState -> FinalStateDialogClosedState

I would like the lines above to be sent in 3 documents.

Here is my filebeat.yml side configuration for the section we are interested in :

- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: cb-rcpd-input

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/cb-rcpd/cb-rcpd.log

parsers:
- multiline:
    type: pattern
    pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
    negate: false
    match: after

For information, the logs are sent to elastic and processed by a grok pattern which only processes lines of type:

2023-08-17 16:13:15.391 |CB RCPD| [INFO] com.payxpert.cbrcpd.cbcom.certified.TlcTlpCBCOMChannel:91 - [/] - ....

Lines starting with something other than a date are therefore dropped.

And this is where I have a problem with my filebeat pattern :
2023-08-18T15:58:33.811Z WARN [elasticsearch] elasticsearch/client.go:414 Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2023, time.August, 18, 15, 58, 32, 785557405, time.Local), Meta:null, Fields:{"agent":{"ephemeral_id":"fc9bb304-ad55-4a62-93e4-063fea737860","hostname":"mpads01","id":"195e33cf-1f2d-4e4b-9f4d-80ba8d1b22b5","name":"mpads01","type":"filebeat","version":"7.17.6"},"ecs":{"version":"1.12.0"},"fields":{"env":"staging"},"host":{"architecture":"x86_64","containerized":false,"hostname":"mpads01","id":"0682dedde5e545d3a97e3b32bcdacd55","ip":["10.10.11.170"],"mac":["00:16:3e:9c:40:ad"],"name":"mpads01","os":{"codename":"bookworm","family":"debian","kernel":"6.1.0-11-amd64","name":"Debian GNU/Linux","platform":"debian","type":"linux","version":"12 (bookworm)"}},"input":{"type":"filestream"},"log":{"file":{"path":"/var/log/cb-rcpd/cb-rcpd.log"},"offset":460216},"message":"\u003c/isomsg\u003e"}, Private:(*input_logfile.updateOp)(0xc0016312c0), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"Provided Grok expressions do not match field value: [</isomsg>]"}, dropping event!

It's like it doesn't match the pattern and therefore sends everything to elastic which causes the error. Then grok drop the lines not starting with dates, which is normal.

Thank you very much for your help or a lead.

Maxence

Hi @mcondamin
Please Share your ingest pipeline.

In short, you just need to add another grok pattern that matches the rest of the content and put that content in another field.

grok patterns will be tried in the order that they are presented, so you can have 1 for the normal lines and one that will parse the normal line + the multiline

There is some special syntax for multi-line so post your ingest pipeline and perhaps we can help

If I recall multiline grok has the (?m) at the beginning..

match => [ "message", "(?m)%{TIMESTA...

Or you can create and use that at the end of the grok

"pattern_definitions" : {
      "GREEDYMULTILINE" : "(.|\n)*"
    }

Hi @stephenb !

Thank you for your answer !
I thought it was at the filebeat level that it was aggregating the rows on a single document, and so it was sending to the ingest pipeline only documents starting with the date.

Here is the ingest pipeline that I use on the elastic side

[
  {
    "grok": {
      "field": "message",
      "patterns": [
        "%{TIMESTAMP_ISO8601:@timestamp} \\|%{APPLI:appli}\\| \\[%{LOGLEVEL}\\] %{JAVACLASS}:%{BASE16NUM:class_num} - \\[%{PROG:transaction_id}\\] - %{GREEDYDATA:msg}"
      ],
      "pattern_definitions": {
        "GREEDYDATA": "(.|\n)*",
        "APPLI": "[\\sa-zA-Z0-9._-]+"
      }
    }
  },
  {
    "date": {
      "field": "@timestamp",
      "formats": [
        "yyyy-MM-dd HH:mm:ss.SSS"
      ],
      "target_field": "@timestamp"
    }
  }
]

I'm reviving this thread because I can't find a solution. I tried to add a second grok pattern after the first, just putting ".*" to accept everything but it doesn't work better. Is there something I misunderstood or didn't understand at all?

First question if you take out the ingest pipeline totally are you seeing the single line and multiline messages in the message field Discover? This is a prerequisite.

2nd if you put the ingest pipeline Does your grok work for the non multi-line?

Answer those two and perhaps we can get this solved ... I am AFK so may not be able to help soon..

Thank you very much for your help, it's really nice anyway !

To answer your two questions:
1/ It's a bit more complicated than that actually. I can't actually disable my ingest pipeline as it is currently in use and functional for other apps. In fact everything started because I wanted to optimize the way I managed the multiline in the filebeat.
On the other applications, the negate variable is false, and therefore my pattern is reversed, I will try to match with the different extralines. But knowing that they are all very different, it's complicated to manage and I wanted to do the opposite for this new application.

2/ Indeed, I hadn't paid attention but it is not integrated, and contrary to the multiline logs where I see in the filebeat logs that grok rejects them, in the case of a single line, I didn't see no message on filebeat...

Perhaps you should set up a little test env... That would s what I would suggest.

Then just put a mix of the logs in a file and test that.

That would be my suggestion....

Reduce the complexity to debug

You can set up Kibana and Elasticsearch with docker in a couple minutes...here

@mcondamin

Got a chance to look at this

1st your filebeat.yml has a couple of errors

Wrong indentation and the multi-line is incorrect with the negate which should be true not false.

Here is my section that works...

# Discusss
- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: cb-rcpd-input

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /Users/sbrown/workspace/sample-data/discuss/multi-line/discuss-ingest.log

  parsers:
  - multiline:
      type: pattern
      pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
      negate: true
      match: after

And here is my pipeline

PUT _ingest/pipeline/discuss-multiline
{
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          """%{TIMESTAMP_ISO8601:@timestamp} \|%{APPLI:appli}\| \[%{LOGLEVEL}\] %{JAVACLASS}:%{BASE16NUM:class_num} - \[%{PROG:transaction_id}\] - %{GREEDYDATA:msg}""",
          """%{TIMESTAMP_ISO8601:@timestamp} \|%{APPLI:appli}\| \[%{LOGLEVEL}\] %{JAVACLASS}:%{BASE16NUM:class_num} - \[%{PROG:transaction_id}\] - %{MULTILINEGREEDYDATA:msg}"""

        ],
        "pattern_definitions": {
          "MULTILINEGREEDYDATA": """(.|\n)*""",
          "APPLI": """[\sa-zA-Z0-9._-]+"""
        }
      }
    },
    {
      "date": {
        "field": "@timestamp",
        "formats": [
          "yyyy-MM-dd HH:mm:ss.SSS"
        ],
        "target_field": "@timestamp"
      }
    }
  ]
}

Here is my sample file

2023-08-17 16:13:15.389 |CB RCPD| [INFO] com.payxpert.cbrcpd.cb2a.certified.parameterdownloading.DialogContext:91 - [/] - f5a1f043-b97f-4ef8-977d-3bcec91d2ab2 Outgoing: <?xml version="1.0" encoding="UTF-8" standalone="no"?><isomsg mti="0854">
    <field id="11" value="075502"/>
    <field id="24" value="860"/>
    <field id="39" value="0000"/>
    <field id="46">
        <subfield id="df54" value="31"/>
    </field>
</isomsg>
2023-08-17 16:13:15.391 |CB RCPD| [INFO] com.payxpert.cbrcpd.cbcom.certified.TlcTlpCBCOMChannel:91 - [/] - Outgoing: 0854002001000204000007550208603030303005DF54000131
2023-08-17 16:13:15.392 |CB RCPD| [INFO] com.payxpert.cbrcpd.cb2a.certified.parameterdownloading.Cb2aStateMachineRunner:105 - [/] - State transition: AcceptCloseDialogState -> FinalStateDialogClosedState

And here are my results

GET filebeat-*/_search

# Results

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".ds-filebeat-8.9.0-2023.08.25-000001",
        "_id": "NSkIKooB2sAptbIL1xqP",
        "_score": 1,
        "_source": {
          "transaction_id": "/",
          "msg": """f5a1f043-b97f-4ef8-977d-3bcec91d2ab2 Outgoing: <?xml version="1.0" encoding="UTF-8" standalone="no"?><isomsg mti="0854">""",
          "input": {
            "type": "filestream"
          },
          "agent": {
            "name": "hyperion.local",
            "id": "bc6fa02a-862e-4019-80db-fd6b422889b9",
            "ephemeral_id": "16fcf574-4be3-427e-8118-ad89deeeba3a",
            "type": "filebeat",
            "version": "8.9.0"
          },
          "@timestamp": "2023-08-17T16:13:15.389Z",
          "ecs": {
            "version": "8.0.0"
          },
          "log": {
            "file": {
              "path": "/Users/sbrown/workspace/sample-data/discuss/multi-line/discuss-ingest.log"
            },
            "offset": 420,
            "flags": [
              "multiline"
            ]
          },
          "appli": "CB RCPD",
          "host": {
            "name": "hyperion.local"
          },
          "message": """2023-08-17 16:13:15.389 |CB RCPD| [INFO] com.payxpert.cbrcpd.cb2a.certified.parameterdownloading.DialogContext:91 - [/] - f5a1f043-b97f-4ef8-977d-3bcec91d2ab2 Outgoing: <?xml version="1.0" encoding="UTF-8" standalone="no"?><isomsg mti="0854">
    <field id="11" value="075502"/>
    <field id="24" value="860"/>
    <field id="39" value="0000"/>
    <field id="46">
        <subfield id="df54" value="31"/>
    </field>
</isomsg>""",
          "class_num": "91"
        }
      },
      {
        "_index": ".ds-filebeat-8.9.0-2023.08.25-000001",
        "_id": "NikIKooB2sAptbIL1xqP",
        "_score": 1,
        "_source": {
          "transaction_id": "/",
          "msg": "Outgoing: 0854002001000204000007550208603030303005DF54000131",
          "input": {
            "type": "filestream"
          },
          "agent": {
            "name": "hyperion.local",
            "id": "bc6fa02a-862e-4019-80db-fd6b422889b9",
            "type": "filebeat",
            "ephemeral_id": "16fcf574-4be3-427e-8118-ad89deeeba3a",
            "version": "8.9.0"
          },
          "@timestamp": "2023-08-17T16:13:15.391Z",
          "ecs": {
            "version": "8.0.0"
          },
          "log": {
            "file": {
              "path": "/Users/sbrown/workspace/sample-data/discuss/multi-line/discuss-ingest.log"
            },
            "offset": 430
          },
          "appli": "CB RCPD",
          "host": {
            "name": "hyperion.local"
          },
          "message": "2023-08-17 16:13:15.391 |CB RCPD| [INFO] com.payxpert.cbrcpd.cbcom.certified.TlcTlpCBCOMChannel:91 - [/] - Outgoing: 0854002001000204000007550208603030303005DF54000131",
          "class_num": "91"
        }
      },
      {
        "_index": ".ds-filebeat-8.9.0-2023.08.25-000001",
        "_id": "NykIKooB2sAptbIL6RoQ",
        "_score": 1,
        "_source": {
          "transaction_id": "/",
          "msg": "State transition: AcceptCloseDialogState -> FinalStateDialogClosedState",
          "input": {
            "type": "filestream"
          },
          "agent": {
            "name": "hyperion.local",
            "id": "bc6fa02a-862e-4019-80db-fd6b422889b9",
            "type": "filebeat",
            "ephemeral_id": "16fcf574-4be3-427e-8118-ad89deeeba3a",
            "version": "8.9.0"
          },
          "@timestamp": "2023-08-17T16:13:15.392Z",
          "ecs": {
            "version": "8.0.0"
          },
          "log": {
            "file": {
              "path": "/Users/sbrown/workspace/sample-data/discuss/multi-line/discuss-ingest.log"
            },
            "offset": 598
          },
          "appli": "CB RCPD",
          "host": {
            "name": "hyperion.local"
          },
          "message": "2023-08-17 16:13:15.392 |CB RCPD| [INFO] com.payxpert.cbrcpd.cb2a.certified.parameterdownloading.Cb2aStateMachineRunner:105 - [/] - State transition: AcceptCloseDialogState -> FinalStateDialogClosedState",
          "class_num": "105"
        }
      }
    ]
  }
}
1 Like

Oh thank you !!! :partying_face:
Thanks to you I just realized what was wrong with my configuration!
It was only about the wrong indentation of parsers (the negate to false was a bad copy paste taken from a test platform)

I think the fact that the indentation was wrong did not take into account the data I had entered!
So much wasted time because of this :o

No need to modify my grok pattern in the end.

Thanks again for the time you spent on my problem!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.