Multiline pattern not working for more complex cases

Hi all!

I'm facing an issue while collecting logs using Filebeat (7.x) in conjunction to a pipeline in Logstash.
The multi-line scenario has been overcome by checking elastic documentation, however, there are specific scenarios where error stack traces are being generated with additional information in our logs, thus, not respecting the regex pattern used, and then lots of "_grokParseFailures" are happening in Logstash, during ingestion process.

Below, we have a sample of how log sample looks like

2021-03-10T00:27:21.0691085+00:00 0HM730JRVJI5F:0000001F [ERR] Something went wrong {
  "applicationName": "...",
  "id": null,
  "responseCode": 409,
  "responseDescription": "UNABLE_TO_LOCK_ROW"
},https://....
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Logged|12_1(ControllerActionInvoker invoker)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   Headers:
{
  Date: Wed, 10 Mar 2021 00:27:21 GMT
  Server: nginx
  Connection: keep-alive
  Content-Type: application/json; charset=UTF-8
  Content-Length: 314
},Method: PATCH, RequestUri: ...
{
  Transfer-Encoding: chunked
  Content-Type: application/json; charset=utf-8
} (32238518)
2021-03-10T00:27:21.0694640+00:00 0HM730JRVJI5F:0000001F [INF] Request finished in 6041.9749ms 409 application/json (791a596a)

This is how the multi-line pattern has been configured in filebeat configuration

  multiline.type: pattern
  multiline.pattern: '^[[:space:]]'
  multiline.negate: false
  multiline.match: after

We can see that my regex pattern just looks for spaces, which is fine for most of the cases, unless we have those scenarios where JSON data is being collect as well.

Ideally, what I want to achieve is to only get a record when it starts with the timestamp information, and proceed with it until the next timestamp pattern (in the line beginning) is found.

With the current implementation, this makes lines as "},https://...." (line 6) be interpreted as a new line, thus, adding the _grokParseFailure at ingestion.

How to overcome this situation?

Have you tried adjusting the regex pattern to look for a datetime at the start of the line, so something like this?

^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}

Shaunak

Thanks for your reply, @shaunak, I tried your suggestion, but it become even worse that it was before. for a couple entries, that actually originated 43 entries, that was supposed to give me only 5 log entries.

At the end, the beat configuration has been configured like

  multiline.type: pattern
  multiline.pattern: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}'
  multiline.negate: false

Also, I'm afraid the pattern may not work, as we can have similar date pattern in JSON as well...

2021-03-10T00:27:21.0691085+00:00 0HM730JRVJI5F:0000001F [ERR] error message
  "applicationName": "",
  "id": null,
  "responseCode": 409,
  "serverResponseCode": null,
  "responseDateTime": "2021-03-10T00:27:21.046Z",
  "responseMessage": "...",
  "responseDescription": "..."
}
  at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
   at Microsoft.AspNetCore.Routing.EndpointMiddleware....
2021-03-10T00:27:21.0694640+00:00 0HM730JRVJI5F:0000001F [INF] info message

Any ideas?

@RFeiten I think if you change multiline.negate to true and add "multiline.match: after" it should start to recognize multiline events.

Thanks, that did the trick!

  multiline.type: pattern
  multiline.pattern: '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7}\+'
  multiline.negate: true
  multiline.match: after