Parsing multiline stacktrace

RFeiten · February 2, 2021, 6:04pm

Having a hard time here to index my logs containing multiple stack trace lines, as following

Here are a log sample

2021-01-04T00:47:39.6082940+00:00 0HM5E5E2861GQ:00000005 [ERR] Something went wrong:(Value cannot be null. (Parameter 'source') at System.Linq.ThrowHelper.ThrowArgumentNullException(ExceptionArgument argument)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at ...
   at ...
   at ...
   at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
   at Swashbuckle.AspNetCore.SwaggerUI.SwaggerUIMiddleware.Invoke(HttpContext httpContext)
   at Swashbuckle.AspNetCore.Swagger.SwaggerMiddleware.Invoke(HttpContext httpContext, ISwaggerProvider swaggerProvider)
   at ...
2021-01-04T00:47:39.6086626+00:00 0HM5E5E2861GQ:00000005 [INF] Request finished in 517.8698ms 500 application/json (791a596a)
2021-01-04T00:47:42.5862067+00:00 0HM5E5E2861H7:00000001 [INF] Request starting HTTP/1.1 GET ...   (ca22a1cb)

Here is the configuration grok I have in my pipeline

input {
    beats {
        port => "5044"
    }
}
filter {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:logdate} %{NOTSPACE:thread_id} \[%{LOGLEVEL:log.level}\] %{GREEDYDATA:msg}" }
    }
    date {
      match => [ "logdate", "yyyy-MM-dd HH:mm:ss.SSSSSSSZZ", "ISO8601" ] #2021-01-04T00:47:39.6082940+00:00
      target => "@timestamp"
    }
    mutate {
      remove_field => [ "message", "logdate", "agent.hostname" ]
    }
}

Here, the multiline configuration I have in my filebeat configuration. That means, I'm following the same as described here Manage multiline messages | Filebeat Reference [7.10] | Elastic

  multiline.type: pattern
  multiline.pattern: '^[[:space:]]'
  multiline.negate: false
  multiline.match: after

Now, when invoking the pipeline, I'm just receiving lot's of _grokparsefailure issues, and my data goes to ElasticSearch without the expected format.

{
    "@timestamp" => 2021-02-02T17:51:30.299Z,
         "agent" => {
        "hostname" => "xxx",
            "type" => "filebeat",
         "version" => "7.10.2"
    },
           "ecs" => {},
      "@version" => "1",
          "tags" => [
        [0] "beats_input_codec_plain_applied",
        [1] "_grokparsefailure"
    ],
         "input" => {
        "type" => "log"
    },
          "host" => {
        "name" => "xxx"
    },
           "log" => {
          "file" => {
            "path" => "/var/log/Log-20210101.txt"
        },
        "offset" => 3364
    }
}

The idea is to make the pipeline work for both INF and ERR, independently if there are multilines or not in the middle. Any ideas?!

Badger · February 2, 2021, 7:06pm

I would suggest that you change your filters to be

grok {
  match => { "message" => "%{TIMESTAMP_ISO8601:logdate} %{NOTSPACE:thread_id} \[%{LOGLEVEL:log.level}\] %{GREEDYDATA:msg}" }
remove_field => [ "message" ]
}
date {
  match => [ "logdate", "yyyy-MM-dd HH:mm:ss.SSSSSSSZZ", "ISO8601" ] #2021-01-04T00:47:39.6082940+00:00
  target => "@timestamp"
  remove_field => [ "logdate" ]
}
mutate {
  remove_field => [ "agent.hostname" ]
}

That way the [message] field does not get removed if grok fails to parse it, and you can look at the value of the field to see why it failed.

RFeiten · February 2, 2021, 7:16pm

I've got to work now, but only for the multiline occurrences. The other ones continues to give me error.

DIfference was to set the "(?m)" in front of the GREEDYDATA, as it can also captures multi lines. Although, still not being able to parse the single line ones...

filter {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:logdate} %{NOTSPACE:thread_id} \[%{LOGLEVEL:log.level}\] (?m)%{GREEDYDATA:msg}" }
    }
    date {
      match => [ "logdate", "yyyy-MM-dd HH:mm:ss.SSSSSSSZZ", "ISO8601" ] #2021-01-04T00:47:39.6082940+00:00
      target => "@timestamp"
    }
    mutate {
      remove_field => [ "logdate", "agent.hostname", "message" ]
    }
}

RFeiten · February 2, 2021, 7:36pm

@Badger I have implemented your suggestion, thanks for that.

By the way, I found the issue by debugging my grok with https://grokdebug.herokuapp.com/

The problem was that I was using \[%{LOGLEVEL:log.level}\] for my log priority/level occurrences. While it accepts the values "ERR", it does not accept "INF"

This is why it was only working for the multilines, because those were the ones with issues stacktrace ocurrences in fact.

To solve it, I simply changed it to \[%{DATA:log.level}\]

It doesn't make much sense in my opinion, but... it's working though.

system · March 2, 2021, 7:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need help with Parsing Multiline StackTrace Logstash	5	2821	July 31, 2019
StackTrace with multiline filter on logstash Logstash	2	1381	August 3, 2017
Multiline java exceptions cannot be searchable in Kibana Elasticsearch	10	6734	December 29, 2016
Parsing of logs with multiline filter Logstash	2	757	December 29, 2016
Stack trace parsing issue Logstash	10	1210	November 22, 2021

Parsing multiline stacktrace

Related topics