Java stacktrace multiline


#1

I'm trying to use multiline in Filebeat to parse Java stacktrace as shown below but still have a hard time extracting and grouping all needed data.

[Mon Nov 26 02:58:42 PST 2018] HEARTBEAT count=1 rev=*** 
PRODUCT- URI: *****
2018-11-26T02:58:57.395-0800: [GC pause]
PRODUCT- URI: *****
[Mon Nov 26 02:58:42 PST 2018] HEARTBEAT count=2 rev=*** 
PRODUCT- URI: *****
WARNING: ****
[2018-11-26 03:00:33,904] WARN  ****
[Mon Nov 26 03:00:33 PST 2018] START: ****
Exception:  *****
	at ****
	at ****
	... 139 more
Caused by: ****
	... 143 more
END:*****

[Mon Nov 26 03:00:33 PST 2018] START: ****
Exception: *****
	at ****
	at ****
	... 139 more
Caused by: ****
	... 143 more
END:*****

Basically, I want to group the most recent HEARTBEAT line and all lines between "START...END" as a single event, anything between HEARTBEAT line and "START" should be dropped.

What I want to get is:

[Mon Nov 26 02:58:42 PST 2018] HEARTBEAT count=2 rev=*** 
PRODUCT- URI: *****
WARNING: ****
[2018-11-26 03:00:33,904] WARN  ****
[Mon Nov 26 03:00:33 PST 2018] START: ****
Exception:  *****
	at ****
	at ****
	... 139 more
Caused by: ****
	... 143 more
END:*****

[Mon Nov 26 03:00:33 PST 2018] START: ****
Exception : *****
	at ****
	at ****
	... 139 more
Caused by: ****
	... 143 more
END:*****

My current filebeat config is:

filebeat.prospectors:
- type: log
  paths:
    - log.txt
  multiline.pattern: '^java.|^[[:space:]]+(at|\.{3})\b|^Caused by:|^END:'
  multiline.negate: false
  multiline.match: after

processors:
  - drop_event:
      when:
         not:
           contains:
             message: "START"
           contains:
             message: "HEARTBEAT"

output.logstash:
  hosts: ["localhost:5044"]

When I use the configuration above, I can extract HEARTBEAT line and "START...END" separately but am not able to group them together as a single event. The reason to group the most recent HEARTBEAT line with "START...END" is because I need some fields in HEARTBEAT line to populate the exception instance for each "START...END".
Should I process HEARTBEAT and "START...END" separately and store the "HEARTBEAT" into es index and look up the "HEARTBEAT" from ES when I receive "START...END"? Will it have time delay to retrieve the most recent HEARTBEAT line?
What is the most appropriate way to handle this kinda of stack trace?

Thanks a lot!


(Noémi Ványi) #2

The multiline reader of Filebeat is not capable of such aggregation. If I were you, I would try to use Logstash to aggregate events using the aggregate filter plugin: https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html


#3

Thanks for your reply! I still have hard time extracting data I need. My current log format looks like this:

A (contains revisionNum, which needs to be used by event B)
B (a multiline line Java stack trace, "START....END")
unwanted message (unstructured message)
B
A
B 
B
...

What I want to achieve is:
Save each B as a single document and use "revisionNum" field from the most recent A.

I tried two approaches to this problem but each approach has its own problem.

Approach 1: Flush whenever sees A, which means AB"unwatedMessage"B... is grouped as a single message and use Grok to extract each B.

  • My question: Is it possible to extract multiple B out the giant message and store them in an array, then do some mutations on each B? Then save each B as a single event. It looks like "metricize" filter where each B shares the same metric from A? But can I really extract multiple instances of B out?

Approach 2: separate event A and B

  • In this case, I process A and B as single event in Logstash. But how I can persist the information in A so that B can use it? Because there are multiple instances of B for the same A, the "aggregate" filter will group all Bs as a single event if I push the event whenever sees a new A.

  • I saw people using Ruby to create class variables so that data can be preserved across events. But it requires logstash to have only one worker, which might reduce performance.

  • Is storing the data as a class variable the only way to make data persistent among multiple events? I'm wondering whether I can store a subfield in @metadata. Will the @metadata be persistent? I basically want to store a temporary map, {appServer1 => A1, appServer2 => A2...}. Whenever I sees A, update the A* filed for the corresponding key=appServer*.

How do you think about the approaches above? Do I have to write a custom plugin to handle this kind of stack trace? Do you have any suggestions to solve this problem? Thanks in advance!