Parse multiline JSON object from Docker log

Hey there, I am trying to parse multiline JSON objects that Filebeat collects from Docker container JSON logfiles and then sends them to Logstash. Specifically, it's Java stacktraces.
My output looks like this:

{"log":"08.10.2020 09:21:14.263 *ERROR* [thread] class.name Service /foo returned response with status 500 | [...]\n","stream":"stdout","time":"2020-10-08T09:21:14.263461178Z"},
{"log":"java.lang.RuntimeException: some error\n","stream":"stdout","time":"2020-10-08T09:21:14.263476378Z"},
{"log":"\u0009at someclass(foo.java:108)\n","stream":"stdout","time":"2020-10-08T09:21:14.263480955Z"}

...and so on.
I managed to pack these stacktraces into one event using Filebeat. Now I am struggling to parse this event's message field so it looks like the regular stacktrace that it is:

*ERROR* [thread] class.name Service /foo returned response with status 500 | [...]
java.lang.RuntimeException: some error\n
    at someclass(foo.java:108)
    at xxx
    [...]
... 1 common frames omitted
Caused by: xxx
... X common frames omitted

My relevant filters are:

filter {
    # Parse Java stacktraces
    # (They come in as multiline collections of JSON objects and have to be converted to valid JSON first.)
    if "multiline" in [log][flags] {
        mutate {
            gsub => [ "message", "}", "}," ]
            replace => [ "message", "[%{message}]" ]
            gsub => [ "message", "},]", "}]" ]
        }
        json {
            source => "message"
            target => "data"
        }
    }
}

Doing the ugly mutate, I get valid JSON at least, so the json plugin does not crash. data then looks like this:

{
    "log": "08.10.2020 09:21:14.263 *ERROR* [thread] class.name Service /foo returned response with status 500 | [...]\n",
    "stream": "stdout",
    "time":"2020-10-08T09:21:14.263461178Z"
},
[...]

But how do I go from here? Somehow, I need json to go from line to line, parse the value of the log key and merge it with the others.

Thanks in advance for your help. :slight_smile:

Turns out that Filebeat is way more suitable to handle this. :slight_smile: I have used a faulty Filebeat configuration that discovered the logs from the actual log path at /var/lib/docker//... - this is how it should look like:

filebeat.autodiscover:
  providers:

    # Tomcat container(s)
    - type: docker
      templates:
        - condition:
            contains:
              docker.container.image: tomcat
          config:
            - type: docker
              containers.ids:
                - "${data.docker.container.id}"
              fields_under_root: true
              processors:
                - add_locale: ~
              # Multiline pattern for Java stacktraces
              multiline:
                negate: true
                pattern: '^\d{2}.\d{2}.\d{4} \d{2}:\d{2}:\d{2}.\d{3}'
                match: after

This way, the JSON will be correctly parsed by Filebeat already and there is no need for the json filter in the Logstash pipeline.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.