Multiline events based on identical fields

WDK · June 3, 2020, 1:52pm

Hi all,

I have a not so straight forward request of a user to process progress logfiles. The catch is that the want the log files multilined in a specific pattern.
All lines that have the exact same timestamp, P-id and T-id should be in a single doc. As you can see from the example below, there are wild combinations of ID's and timestamp. The bold lines for example should be in 1 doc.

[18/11/12@15:08:49.357+0100] P-035768 T-2833205120 2 AS AS Application Server connected with connection id: REDACTED

[18/11/12@15:08:49.376+0100] P-035768 T-2833205120 1 AS -- Log entry types activated: Db.Connects:2

[18/11/12@16:22:06.352+0100] P-040789 T-2835367808 1 AS -- Logging level set to = 2

[18/11/12@16:22:06.354+0100] P-040789 T-2835367808 1 AS -- Log entry types activated: ASPlumbing,DB.Connects
[18/11/12@16:22:06.354+0100] P-040789 T-2835367808 2 AS AS Starting application server for REDACTED. (5560)
[18/11/12@16:22:06.354+0100] P-040789 T-2835367808 2 AS AS Application Server Startup. (5473)

[18/11/12@16:22:06.434+0100] P-040789 T-2835367808 2 AS CONN Database master Options: (12699)

[18/11/12@16:22:06.435+0100] P-040789 T-2835367808 2 AS CONN Connected to database master, user number 88. (9543)

[18/11/12@16:22:06.442+0100] P-040789 T-2835367808 2 AS CONN Database temp-db Options: (12699)

I was thinking of sending the log files with beats to a specific pipeline in logstash with a single worker and use the aggregate filter but I can't wrap my head around where to start.
Is this even possible with the built-in logstash filters?

Badger · June 3, 2020, 2:04pm

You should be able to do that with an aggregate filter. If you have already parsed the timestamp, P-id, and T-id you can combine them as the task_id using sprintf references

task_id => "%{timestamp} %{P-id} %{T-id}"

or else dissect them off

dissect { mapping => { "message" => "%{task_id} %{+task_id} %{+task_id} %{}" } }

Then configure the aggregate filter to flush after a timeout, as in example 3.

In the code option, either create an array

code => '
    map["messages"] ||= []
    map["messages"] << event.get("message")
'

or concatenate them. Whatever works for you.

WDK · June 8, 2020, 6:41am

Thanks for the directions! I'll try that.

WDK · June 9, 2020, 11:32am

I finally got the aggregation to "sorta" work with the following:

aggregate {
task_id => "%{progress_timestamp} %{progress_pid} %{progress_thread}"
code => "
map['progress_message'] ||=
map['progress_message'] << event.get('message')
"
push_previous_map_as_event => true
timeout => 10
}

It does seem to trip on the timestamp though since I see logs with different timestamps getting aggregated:

[20/06/09@13:25:06.126+0200] P-082061 T-2238457728 1 AS QRX-DEBUG REDACTED, [20/06/09@13:25:06.126+0200] P-082061 T-2238457728 1 AS QRX-DEBUG START HandleRequest: , [20/06/09@13:25:06.289+0200] P-082061 T-2238457728 1 AS QRX-DEBUG ------------------------------------------------------------------------------------------------------------------------------, [20/06/09@13:25:06.289+0200] P-082061 T-2238457728 1 AS QRX-DEBUG ], "info":{"changesOnly":

Looking at the debug logs, I can see this. Is this because the timestamp varies and the P and T values do not or is there another issue?

[2020-06-09T13:27:00,329][DEBUG][logstash.filters.aggregate] Aggregate create_timeout_event call with task_id '%{progress_timestamp} P-124198 T-242648960'

WDK · June 10, 2020, 10:44am

@Badger just FYI.

with the following aggregate block it seems to be solid.

aggregate {
  task_id => "%{progress_raw_timestamp} %{progress_pid} %{progress_thread}"
  code => "
    map['progress_message'] ||= []
    map['progress_message'] << event.get('message') +10.chr
    map['pid'] ||= event.get('progress_pid')
    map['thread'] ||= event.get('progress_thread')
    map['loglevel'] ||= event.get('progress_loglevel')
    map['environment'] ||= event.get('progress_environment')
    map['host'] = event.get('host')
    map['agent'] = event.get('agent')
    map['log'] = event.get('log')
  "
  push_map_as_event_on_timeout => true
  timeout_tags => ['_aggregatetimeout']
  timeout => 10
}
if "_aggregatetimeout" not in [tags]
{
  drop {}
}

system · July 8, 2020, 10:45am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to consider multiple lines(starts with Timestamp all the time) as a single event? Logstash	4	662	May 8, 2017
XML Multiline pattern problems with Logstash Logstash	1	495	November 7, 2017
Multiline issue with log Logstash	6	320	August 15, 2018
Multiple patterns for Multiline Configuration? Logstash	3	1487	July 6, 2017
Logstash multiline codec do not read all lines Logstash	3	224	September 30, 2022

Multiline events based on identical fields

Related topics