Combining multiline entries based on a common unique ID

TJelastico · August 28, 2017, 11:52pm

Hi,

I have a single log file which contains several multi-line log formats that defy a using a single set of multiline.pattern, multiline.match, and multiline.negate entries. Using multiple OR statements in multiline.pattern entry doesn't help because I would need multiple match and negate entries.

However, fortunately, each multi-line log entry includes an ID that is unique to that particularly multi-line log entry. Is there anyway to look for a matching ThreadID/PID/etc across multiple lines, and group them into a single multi-line entry? Fortunately they are all "in a row", and not mixed in with one another.

Below is a simplified example, where I would like to combine these 4 log entries, into 2 multi-line entries based on the unique "UniqueID" value given in each log file entry:

[01:28:07.357] UniqueID:111111 Muti-line entry 1
[01:28:07.358] UniqueID:111111 xxx Multiline Entry 2
[01:28:07.367] UniqueID:222222 Muti-line entry 1
[01:28:07.368] UniqueID:222222 xxxx Multiline Entry 2

FWIW, I'm using filebeat on linux to send the logs to graylog, so if filebeat isn't able to do this, I'm open to other ideas that anyone may have to combine log entries based on a unique ID.

Thanks for your help!

steffens · August 29, 2017, 12:11pm

Do you have a more real world example? E.g. what does the xxx stand for? Is there still some common pattern like additional spaces/tabs? How do you know a multiline pattern is finished? Is the some common final line indicator or just by IDs? Having IDs, can the lines of multiple events become intermixed?

You will need to parse and have some kind of join/correlation operation on the Unique ID. E.g. the Logstash aggregate filter might help.

TJelastico · August 29, 2017, 8:31pm

Thanks Steffens,

I've included a real world sample below, and I would be grateful if you had some insight on how I could accomplish using regex's, but the only commonality I see is unique ID given to each multi line entry.

No, the line's with Unique ID's cannot become intermixed fortunately! That's the only saving grace in this mess of a log file

The Unique ID in the sample below starts with {http--8000-

Thanks.
FYI, I did shorten some of these entry examples just to make it more readable, but the first/last, and structure of the middle was left untouched.

[00:54:04.866] {http--8000-15$268023904} Serious error occurrred: java.lang.NullPointerException
[00:54:04.866] {http--8000-15$268023904} java.lang.NullPointerException
[00:54:04.866] {http--8000-15$268023904} Error: java.lang.NullPointerException
[00:54:04.866] {http--8000-15$268023904}     
[00:55:47.533] {Timer-6} 8/11/17 12:55 AM | SessionCache.Perge - 0ms (75/171) n=30
[00:55:56.359] {DefaultQuartzScheduler_QuartzSchedulerThread} 00:55:56.359 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG org.quartz.core.QuartzSchedulerThread - batch acquisition of 0 triggers
[01:11:33.155] {http--8000-10$1894935270} boards.exceptions.RedirectException
[01:11:33.155] {http--8000-10$1894935270}       at boards.request.Request.redirect(Request.java:703)
[01:11:33.156] {http--8000-10$1894935270}       at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:743)
[01:11:33.156] {http--8000-10$1894935270}       at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:662)
[01:11:33.156] {http--8000-10$1894935270}       at java.lang.Thread.run(Thread.java:619)
[01:28:02.403] {DefaultQuartzScheduler_QuartzSchedulerThread} 01:28:02.403 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG org.quartz.core.QuartzSchedulerThread - batch a
cquisition of 0 triggers
[01:28:07.357] {http--8000-4$1868584300} Error: com.caucho.java.JavaCompileException: /boards/test/realcategorystats.jsp:42: cannot find symbol
[01:28:07.357] {http--8000-4$1868584300} symbol  : method getRealStatsURL(java.lang.String)
[01:28:07.357] {http--8000-4$1868584300} location: class boards.util.URL
[01:28:07.357] {http--8000-4$1868584300}       out.print(( URL.getRealStatsURL(mr.getParameter(Schema.TEST_ID))));
[01:28:07.357] {http--8000-4$1868584300}                      ^
[01:28:07.357] {http--8000-4$1868584300} 1 error
[01:28:07.357] {http--8000-4$1868584300}        at com.caucho.java.AbstractJavaCompiler.run(AbstractJavaCompiler.java:102)
[01:28:07.357] {http--8000-4$1868584300}        at java.lang.Thread.run(Thread.java:619)
[01:28:07.357] {http--8000-4$1868584300} 
[01:28:28.923] {DefaultQuartzScheduler_QuartzSchedulerThread} 01:28:28.923 [DefaultQuartzScheduler_QuartzSchedulerThread] DEBUG org.quartz.core.QuartzSchedulerThread - batch acquisition of 0 triggers
[01:38:02.776] {http--8000-20$2105617913} Error: java.lang.NumberFormatException: For input string: "5 and 1=1"
[01:38:02.776] {http--8000-20$2105617913}       at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
[01:38:02.776] {http--8000-20$2105617913}       at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:273)
[01:38:02.776] {http--8000-20$2105617913}       at java.lang.Thread.run(Thread.java:619)
[01:38:02.776] {http--8000-20$2105617913} 
[01:38:13.803] {http--8000-18$1403411429} Error: java.lang.NumberFormatException: For input string: "5 or (1,2)=(select*from(select name_const(CHAR(111,108,111,108,111,115,104,101,114),1),name_const(CHAR(111,108,111,108,111,115,104,101,114),1))a) -- and 1=1"
[02:43:46.351] {http--8000-19$302001047} No random.  Size: 5

steffens · August 30, 2017, 2:03pm

Whoa, what's this? Looking for common structures, this sample already looks like 6 different log formats embedded into another log format. Some background processes logs getting captured into the final log?

This looks like a case for Logstash plus aggregate filter. Problem with aggregate filter and this file format is, the 'id' might not be enough in case you want have multiple files or multiple hosts you get these kind of logs from.

Having just one file, but multiple hosts I'd also consider using logstash on the actual host, without beats. Just for doing the initial aggregation.

Using grok you can parse the timestamp and the 'id'. As you can have multiple kind of IDs, just match complete ID including braces as strings. e.g. this grok pattern \[%{TIME:time}\] (?<id>\{.*\}) %{DATA:line} get's you the fields time, id and line. Btw. you are missing the date, hope you can extract the date from the file name. You can give the grok pattern a try here: https://grokdebug.herokuapp.com

The aggregate filter allows you to store intermediate results in a map and finally emit an event once the id changes. As I have never used the filter myself you should check out it's documentation. The docs also contain some samples.

TJelastico · August 30, 2017, 8:14pm

Thanks Steffens,

I really appreciate the guidance. It sounds complicated Before I dig into that, I'm going to see if I can get the developers to write more consistent log files. I was hoping for an "easy" tool I could use client side, but it sounds like this will be kind of involved.

If anyone knows of any other (non elastic) tools I'd appreciate a heads up as well.

Thanks!

system · September 18, 2017, 11:52pm

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Multiline vs aggregate Beats filebeat	13	2397	March 14, 2019
2 questions about multiline patterns Beats filebeat	2	380	July 11, 2019
Using Filebeat's Multiline to combine 2 lines only Beats filebeat	3	741	March 11, 2019
Merge lines Beats filebeat	6	471	March 20, 2019
Multiline events based on identical fields Logstash	5	610	July 8, 2020

Combining multiline entries based on a common unique ID

Related Topics