Grok matching multi-lines saving first and last value to separate fields

Hello everyone, as the caption indicates I am trying to match multiple-lines of logs of which I would like to save first occurence of timestamp to separate field than last occurence of the timestamp. I am using codec multiline to separate tasks from big log file and (?m) regex before timestamp. I need to do this to measure time between process completion. Any ideas how this could be done ?

2020-12-16 15:43:31.605  INFO 18020 --- [http-nio-8080-exec-3] c.n.w.workflow.service.DataService       : Getting groups of task 5fda1d109ceec746643760f8 in case 11.11.2020 13:20 level: 0
2020-12-16 15:43:34.346  INFO 18020 --- [http-nio-8080-exec-1] c.n.w.workflow.service.TaskService       : [5fda1d109ceec746643760f5]: Task [GENERATE] in case [11.11.2020 13:20] assigned to [super@netgrif.com] was finished

I would like the output to be
Start: 15:43:31.605
Severity: INFO
INT: 18020
Thread: http-nio-8080-exec-3
Class: c.n.w.workflow.service.TaskService
GREEDYDATA: [...,...,...]
End: 15:43:34.346

I was thinking of removing all items from array of timestamps generated by (?m) between first and last and then separate it into two different fields but Im not really sure how to do so.

grok { match => { "message" => "\A%{TIMESTAMP_ISO8601:start}.*^%{TIMESTAMP_ISO8601:end}[^\n]*\Z" } }

\A anchors the first timestamp to the beginning of the message field. Then for the end time you anchor it to start of line using ^ and the [^\n]*\Z means there cannot be another newline from there to the very end of the text.

1 Like

Yes! Thank you , that is exactly what I was looking for. There should be special awards for people like you helping out others at Saturday nights:)

I just tried to use it incorporated into my pattern , but im not quite getting the output that I would like to be getting from GREEDYDATA. For some reason im getting single value rather than array of values from all the GREEDYDATA values.
My entire pattern :
(?m)\A%{TIMESTAMP_ISO8601:start}.* %{SPACE} %{LOGLEVEL:LEVEL} %{INT:NUMBER} --{2} \[%{DATA:THREAD}] %{DATA:CLASS}\s(?m)%{GREEDYDATA:message}^%{TIMESTAMP_ISO8601:end}[^\n]*\Z

Is my usage correct ?
I thought (?m) before greedy data indicates multiline input and that would result into an array of values.

grok will not return an array of matches. If you need multiple matches for a single pattern then use a ruby filter and the String .scan function. There is an example of doing that here.

Sorry I didnt express myself correctly. I would be totally fine with the output that is in top answer here in the field "extralines". I thought this could be done with simple (?m) usage.

It works for me. If I start with

   "message" => "2020-12-16 15:43:31.605  INFO 18020 --- [http-nio-8080-exec-3] c.n.w.workflow.service.DataService       : Getting groups of task 5fda1d109ceec746643760f8 in case 11.11.2020 13:20 level: 0\nFoo\n    Bar\n2020-12-16 15:43:34.346  INFO 18020 --- [http-nio-8080-exec-1] c.n.w.workflow.service.TaskService       : [5fda1d109ceec746643760f5]: Task [GENERATE] in case [11.11.2020 13:20] assigned to [super@netgrif.com] was finished"

then

     grok { match => { "message" => "(?m)\A%{TIMESTAMP_ISO8601:start}.* %{SPACE} %{LOGLEVEL:LEVEL} %{INT:NUMBER} --{2} \[%{DATA:THREAD}] %{DATA:CLASS}\s(?m)%{GREEDYDATA:message}^%{TIMESTAMP_ISO8601:end}[^\n]*\Z" } }

results in

       "end" => "2020-12-16 15:43:34.346",
     "CLASS" => "c.n.w.workflow.service.DataService",
     "LEVEL" => "INFO",
   "message" => [
    [0] "2020-12-16 15:43:31.605  INFO 18020 --- [http-nio-8080-exec-3] c.n.w.workflow.service.DataService       : Getting groups of task 5fda1d109ceec746643760f8 in case 11.11.2020 13:20 level: 0\nFoo\n    Bar\n2020-12-16 15:43:34.346  INFO 18020 --- [http-nio-8080-exec-1] c.n.w.workflow.service.TaskService       : [5fda1d109ceec746643760f5]: Task [GENERATE] in case [11.11.2020 13:20] assigned to [super@netgrif.com] was finished",
    [1] "      : Getting groups of task 5fda1d109ceec746643760f8 in case 11.11.2020 13:20 level: 0\nFoo\n    Bar\n"
],

etc. Note that the array of arrays is just a presentation thing at https://grokdebug.herokuapp.com/. Even in that SO answer you linked to, the GREEDYDATA is matching a single string.

If you want to get each line in a separate array entry then use mutate+split.

Im not sure whats causing this but I cant seem to get the same output as you


This still results in

sorry for the screenshot I hope its good enough quality to see the content.

I copied your input message and pattern just to be 100% sure im not screwing up anywhere myself. Thanks for the tip for mutate + split I will certainly do that after I figure this out.

That is exactly what I would expect you to get. What do you think should be different?

Sorry i re-read your answer and I thought I saw something different in output , I get why Im getting a single value now. Is it possible for me to get ALL the GREEDYDATA values in a single field ? Thanks:)

That is what you get by default. I do not understand what you want that is different from what you are getting.

I mean greedydata from ALL lines accumulated in a single field.