I have a not so straight forward request of a user to process progress logfiles. The catch is that the want the log files multilined in a specific pattern.
All lines that have the exact same timestamp, P-id and T-id should be in a single doc. As you can see from the example below, there are wild combinations of ID's and timestamp. The bold lines for example should be in 1 doc.
[18/11/12@15:08:49.357+0100] P-035768 T-2833205120 2 AS AS Application Server connected with connection id: REDACTED
[18/11/12@16:22:06.352+0100] P-040789 T-2835367808 1 AS -- Logging level set to = 2
[18/11/12@16:22:06.354+0100] P-040789 T-2835367808 1 AS -- Log entry types activated: ASPlumbing,DB.Connects [18/11/12@16:22:06.354+0100] P-040789 T-2835367808 2 AS AS Starting application server for REDACTED. (5560) [18/11/12@16:22:06.354+0100] P-040789 T-2835367808 2 AS AS Application Server Startup. (5473)
[18/11/12@16:22:06.434+0100] P-040789 T-2835367808 2 AS CONN Database master Options: (12699)
[18/11/12@16:22:06.435+0100] P-040789 T-2835367808 2 AS CONN Connected to database master, user number 88. (9543)
[18/11/12@16:22:06.442+0100] P-040789 T-2835367808 2 AS CONN Database temp-db Options: (12699)
I was thinking of sending the log files with beats to a specific pipeline in logstash with a single worker and use the aggregate filter but I can't wrap my head around where to start.
Is this even possible with the built-in logstash filters?
You should be able to do that with an aggregate filter. If you have already parsed the timestamp, P-id, and T-id you can combine them as the task_id using sprintf references
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.