Hello everyone, I have a log file which basically doesn’t follow the format of common log files. Normally in the known log like Apache, you have an event logged in one line. Mine, however, spread out through multiline, kinda like COBOL logs. I already tried multiline codec to parse this but it ain’t helping much. I want to extract some information, I'll detail it at the end of the topic.
This topic is fairly long and complicated imo, so I really really appreciated your time and your opinion on this because I've never seen any log like this before. Feel free to quote me on anything if you don't get it. I'll be happy to buy you a coffee for sorting this out
Here is an example of the structure of the log.
*****************************************************************
* Chain ABC (UNIX) *
*****************************************************************
JOB: ABC DATE : 15/07/2019 12:15:42
*-------------------------- STEP000 ---------------------------*
* Initialization *
*---------------------------------------------------------------*
AA100PPP-STOC0001-------------------------------------
AA100PPP-STOC0002-PROGRAM IDENTIFICATION : PPP/2.000/2018-01-01/F/
AA100PPP-STOC0003-START PROCESSING – START TIMESTAMP : 2019-07-15-12.45.00
AA100PPP-STOC0004-....+....1....+....2....+....3....+....4....+....5
…
AA100PPP-STOC0010-*** PPP : END SUCCESSFULLY – END TIMESTAMP : 2019-07-15-12.46.00
*-------------------------- STEP001 ---------------------------*
* Configuration *
*---------------------------------------------------------------*
…
*-------------------------- STEP002 ---------------------------*
* Execution *
*---------------------------------------------------------------*
…
*-------------------------- STEP003 ---------------------------*
* Computing *
*---------------------------------------------------------------*
BBCPTXYZ-STOC0001-------------------------------------
BBCPTXYZ-STOC0002-PROGRAM IDENTIFICATION : TY1/2.000/2018-01-01/F/
*CD00TY1-STOC0002-PROGRAM IDENTIFICATION : XYZ/2.000/2018-02-14/F/
*CD00TY1-STOC0003-START PROCESSING – START TIMESTAMP : 2019-07-15-12.51.00
…
BBCPTXYZ-STOC0001-------------------------------------
*CD00TY1-STOC0002-PROGRAM IDENTIFICATION : OPM/2.000/2018-02-14/F/
*CD00TY1-STOC0003-START PROCESSING – START TIMESTAMP : 2019-07-15-12.53.00
BBCPTOPM-STOC0001-------------------------------------
BBCPTOPM-STOC0002-FILE EXTRACTED : 000000000002
*CD00TY1-STOC0010-*** XYZ : END SUCCESSFULLY - END TIMESTAMP : 2019-07-15-12.55.00
…
JOB: ABC DATE : 15/07/2019 12:58:34
*---------------------------------------------------------------*
* End job *
*---------------------------------------------------------------*
Problem 1:
As you can see we have two lines defined the job ABC and two timestamps global of the whole job. The first one is start time and the second one if end time. Unfortunately, there is no way to tell the beginning and the end because they have the same format. I want the job, the timestamps added to every event extracted from this file.
Problem 2:
The content of the log file is divided by the block named STEP. For each step you have a main program identified by last three characters of the first eight characters at the first line of step and eventually many mini-program, I will talk about this in Problem 3 following.
For example in STEP000
AA100PPP-STOC0001-------------------------------------
PPP from AA100PPP is the main program of this step. It has two local timestamps, beginning and end time. With this information, it creates an event. In a nutshell, an event is defined like this:
{
job: ABC
timestampstartglobal: 15/07/2019 12:15:42
timestampendglobal: 15/07/2019 12:58:34
process: PPP
timestampstartprocess: 2019-07-15-12.45.00
timestampendprocess: 2019-07-15-12.46.00
}
Problem 3:
In STEP003, the main program XYZ called mini program TY1. TY1 is now responsible to manage all the program in this step. We have here multiple PROGRAM IDENTIFICATION / START PROCESSING – START TIMESTAMP for each program. So for this step, the timestamps of XYZ is all I cared about but this info is chopped into 2 lines and somehow I need to tell a parser to take the start timestamp of XYZ at the line right under PROGRAM IDENTIFICATION
XYZ should take start timestamp from this line:
*CD00TY1-STOC0003-START PROCESSING – START TIMESTAMP : 2019-07-15-12.51.00
not the line of the mini program OPM
*CD00TY1-STOC0003-START PROCESSING – START TIMESTAMP : 2019-07-15-12.53.00
Problem 4:
Also in STEP003, I want to add a nested field to event with information extracted from the mini program OPM
BBCPTOPM-STOC0002-FILE EXTRACTED : 000000000002
In the end, we have an event like the one in Problem 2 with an extra field
{
job: ABC
process: XYZ
...
extraInfo: [
fileExtracted: 2
]
}
Is there any way to solve all the problems and achieve the parsing for this kind of log? Thanks