How to grok a multiline message?

(I've come across a number of similar questions, but either they don't match what I'm doing or they don't have answers.)

I've got Java log lines, to the end of which might be appended " | {<some JSON>}". Parsing the main bit of the line is fine, and splitting off the JSON at the end with "\|%{SPACE}%{GREEDYDATA:json_string}" works fine.

The problem comes when there is more than one line to the message, because it's also got a Java stacktrace. (The multiline stuff is done in Filebeat.) The GREEDYDATA eats the JSON string and the following stack trace lines, leading, not surprisingly, to a JSON parse failure when I feed json_string through the JSON filter. My reading is that it isn't supposed to do this, as GREEDYDATA is .*, and . isn't supposed to match a newline.

So, with multiline input like

<normal Java log line> | {<some JSON>}
<stack trace first line>
...
<stack trace last line>

what can I do to extract just the "{<some JSON>}" for later JSON parsing, leaving

<normal Java log line>
<stack trace first line>
...
<stack trace last line>

in one field?

To extrace the JSON you can match one or more not-newline followed by a newline

    grok { match => { "message" => "%{DATA} \| (?<json>[^
]+)
" } }

To remove it use a similar mutate+gsub.

Sorry, I suspect that's got a bit mangled by the time it gets to the browser? Or did you mean literal newlines, spreading the pattern across three lines?

Yes, literal newlines in the string.

Ah, now that I'd never have thought of. I'll give it a try - ta.

OK, so this works for me

    match => { "message" => [ "%{TIMESTAMP_ISO8601}%{SPACE}%{LOGLEVEL}%{SPACE}%{HOSTNAME} %{JAVACLASS} \[[^\]]*\]%{SPACE}[^|]*\|%{SPACE}(?<json_string>[^
]+)" ] }

The reason I need to parse all the preceding stuff is that there might a vertical bar character in one of the other fields (specifically inside the field delimited by [...]) so I need to make sure I don't find an earlier "|" than the one I want.

But I'm now having trouble working out how to remove what's in "json_string" from "message". mutate/gsub as you suggest can be used to remove something that matches a regex, but I want to remove a known text string ... and what's in "json_string" could be anything at this point, there's certainly no reason to suppose that there isn't stuff inside it that would look like an invalid regex, or a valid one that matches the wrong thing.

I can't use a regex to match "from a | to a newline" because there's no way I can think of to persuade it to find the right | (there might be one in the [...] field and there might be several in the JSON).

Use ruby

    mutate { add_field => { "someField" => "foo bar | baz" } }
    grok { match => { "someField" => "^%{WORD} %{WORD} %{GREEDYDATA:theRest}" } }
    ruby {
        code => '
            s = event.get("someField")
            s.slice!(event.get("theRest"))
            event.set("someField", s)
        '
    }

will result in

 "someField" => "foo bar ",
   "theRest" => "| baz"

Yeah, I'd just come to the conclusion that it'd have to be Ruby!

Thanks very much for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.