Writing Filebeat Multiline Pattern for Unusual Log Format

DougR · June 13, 2019, 3:47pm

As I'm instrumenting a new application for log monitoring via filebeat, I came across a log format I've never seen before.

It's a multiline log format, but instead of just continuing, each continuing line contains a copy of the header for the log entry, as follows:

[Thread    |2640|D] 2019-06-13 00:00:00.021-05:00  Line 1 of message:
[Thread    |2640|H] 2019-06-13 00:00:00.021-05:00    Line 2 of message:
[Thread    |2640|H] 2019-06-13 00:00:00.021-05:00      Line 3 of message
[Thread    |2640|H] 2019-06-13 00:00:00.021-05:00      Line 4 of message
[Thread    |2640|H] 2019-06-13 00:00:00.021-05:00      Line 5 of message

Taking the following example of a line header:

[Thread    |2640|D]

Thread is obvious - It represents the thread name.
2640 - This is actually a 2-byte hex number that appears to be a message ID. However there are some duplicates of this between consecutive messages, so I'm not certain about this.
D - This is a single-character code, that appears to be message severity, i.e., DEBUG, in this case. However, subsequent entries that are clearly part of the same message have a code of H, which appears to indicate a continuing line.

My questions are:

Is it possible to write a filebeat multi-line pattern to concatenate these lines? What would that pattern look like?
Assuming that I can write a filebeat multi-line rule to concatenate these lines, how would I write a logstash rule to clean up the subsequent lines, if possible?

DougR · June 13, 2019, 3:59pm

It turns out I was incorrect regarding H designating a continuing entry (I haven't figured out what it designates yet).

However, continuing entries are indented, after the line header, so the following multiline pattern works in the filebeat.yml:

^\[.*\|[0-9a-fA-F]{4}\|[A-Z]\] +[0-9]{4}-[0-9]{2}-[0-9]{2} +[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}-[0-9]{2}:[0-9]{2}  [^ ]

Additionally, in Logstash, I wrote the following mutate rule to remove subsequent line headers from the multiline entry. This is placed before the grok, which parses the entry, but I'm not certain that the location matters:

if "multiline" in [log][flags] {
    mutate {
      # Clean up headers on subsequent multilines.
      gsub => ["message",

               # Match 4 spaces at end to avoid replacing timestamp at beginning of log message.
               "\[\S*\s*\|[0-9a-fA-F]{4}\|[\S]\]\s+[0-9]{4}-[0-9]{2}-[0-9]{2}\s+[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}-[0-9]{2}:[0-9]{2}\s{4}",
               "  "]
    }
}

system · July 11, 2019, 3:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.