As I'm instrumenting a new application for log monitoring via
filebeat, I came across a log format I've never seen before.
It's a multiline log format, but instead of just continuing, each continuing line contains a copy of the header for the log entry, as follows:
[Thread |2640|D] 2019-06-13 00:00:00.021-05:00 Line 1 of message: [Thread |2640|H] 2019-06-13 00:00:00.021-05:00 Line 2 of message: [Thread |2640|H] 2019-06-13 00:00:00.021-05:00 Line 3 of message [Thread |2640|H] 2019-06-13 00:00:00.021-05:00 Line 4 of message [Thread |2640|H] 2019-06-13 00:00:00.021-05:00 Line 5 of message
Taking the following example of a line header:
Threadis obvious - It represents the thread name.
2640- This is actually a 2-byte hex number that appears to be a message ID. However there are some duplicates of this between consecutive messages, so I'm not certain about this.
D- This is a single-character code, that appears to be message severity, i.e.,
DEBUG, in this case. However, subsequent entries that are clearly part of the same message have a code of
H, which appears to indicate a continuing line.
My questions are:
- Is it possible to write a filebeat multi-line pattern to concatenate these lines? What would that pattern look like?
- Assuming that I can write a filebeat multi-line rule to concatenate these lines, how would I write a logstash rule to clean up the subsequent lines, if possible?