Java stacktrace multiline

bbking · November 27, 2018, 10:39pm

I'm trying to use multiline in Filebeat to parse Java stacktrace as shown below but still have a hard time extracting and grouping all needed data.

[Mon Nov 26 02:58:42 PST 2018] HEARTBEAT count=1 rev=*** 
PRODUCT- URI: *****
2018-11-26T02:58:57.395-0800: [GC pause]
PRODUCT- URI: *****
[Mon Nov 26 02:58:42 PST 2018] HEARTBEAT count=2 rev=*** 
PRODUCT- URI: *****
WARNING: ****
[2018-11-26 03:00:33,904] WARN  ****
[Mon Nov 26 03:00:33 PST 2018] START: ****
Exception:  *****
	at ****
	at ****
	... 139 more
Caused by: ****
	... 143 more
END:*****

[Mon Nov 26 03:00:33 PST 2018] START: ****
Exception: *****
	at ****
	at ****
	... 139 more
Caused by: ****
	... 143 more
END:*****

Basically, I want to group the most recent HEARTBEAT line and all lines between "START...END" as a single event, anything between HEARTBEAT line and "START" should be dropped.

What I want to get is:

[Mon Nov 26 02:58:42 PST 2018] HEARTBEAT count=2 rev=*** 
PRODUCT- URI: *****
WARNING: ****
[2018-11-26 03:00:33,904] WARN  ****
[Mon Nov 26 03:00:33 PST 2018] START: ****
Exception:  *****
	at ****
	at ****
	... 139 more
Caused by: ****
	... 143 more
END:*****

[Mon Nov 26 03:00:33 PST 2018] START: ****
Exception : *****
	at ****
	at ****
	... 139 more
Caused by: ****
	... 143 more
END:*****

My current filebeat config is:

filebeat.prospectors:
- type: log
  paths:
    - log.txt
  multiline.pattern: '^java.|^[[:space:]]+(at|\.{3})\b|^Caused by:|^END:'
  multiline.negate: false
  multiline.match: after

processors:
  - drop_event:
      when:
         not:
           contains:
             message: "START"
           contains:
             message: "HEARTBEAT"

output.logstash:
  hosts: ["localhost:5044"]

When I use the configuration above, I can extract HEARTBEAT line and "START...END" separately but am not able to group them together as a single event. The reason to group the most recent HEARTBEAT line with "START...END" is because I need some fields in HEARTBEAT line to populate the exception instance for each "START...END".
Should I process HEARTBEAT and "START...END" separately and store the "HEARTBEAT" into es index and look up the "HEARTBEAT" from ES when I receive "START...END"? Will it have time delay to retrieve the most recent HEARTBEAT line?
What is the most appropriate way to handle this kinda of stack trace?

Thanks a lot!

kvch · November 30, 2018, 8:46am

The multiline reader of Filebeat is not capable of such aggregation. If I were you, I would try to use Logstash to aggregate events using the aggregate filter plugin: https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html

bbking · December 1, 2018, 1:18am

Thanks for your reply! I still have hard time extracting data I need. My current log format looks like this:

A (contains revisionNum, which needs to be used by event B)
B (a multiline line Java stack trace, "START....END")
unwanted message (unstructured message)
B
A
B 
B
...

What I want to achieve is:
Save each B as a single document and use "revisionNum" field from the most recent A.

I tried two approaches to this problem but each approach has its own problem.

Approach 1: Flush whenever sees A, which means AB"unwatedMessage"B... is grouped as a single message and use Grok to extract each B.

My question: Is it possible to extract multiple B out the giant message and store them in an array, then do some mutations on each B? Then save each B as a single event. It looks like "metricize" filter where each B shares the same metric from A? But can I really extract multiple instances of B out?

Approach 2: separate event A and B

In this case, I process A and B as single event in Logstash. But how I can persist the information in A so that B can use it? Because there are multiple instances of B for the same A, the "aggregate" filter will group all Bs as a single event if I push the event whenever sees a new A.
I saw people using Ruby to create class variables so that data can be preserved across events. But it requires logstash to have only one worker, which might reduce performance.
Is storing the data as a class variable the only way to make data persistent among multiple events? I'm wondering whether I can store a subfield in @metadata. Will the @metadata be persistent? I basically want to store a temporary map, {appServer1 => A1, appServer2 => A2...}. Whenever I sees A, update the A* filed for the corresponding key=appServer*.

How do you think about the approaches above? Do I have to write a custom plugin to handle this kind of stack trace? Do you have any suggestions to solve this problem? Thanks in advance!

system · December 29, 2018, 1:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat Multiline Java Stack Trace Beats filebeat	13	13523	December 5, 2018
Multiline Java Stack Trace parsing doesn't work Beats filebeat	5	1054	May 13, 2020
Multiline.pattern in filebeat Beats filebeat	4	575	April 25, 2017
Multiline Java stack trace...yes another one Beats filebeat	9	2719	May 20, 2019
[CLOSED / USER ERROR] Filebeat -> Logstash multi-line Logstash	4	879	May 18, 2017

Java stacktrace multiline

Related topics