Grok matching multi-lines saving first and last value to separate fields

Daniel_Jankech · February 20, 2021, 8:41pm

Hello everyone, as the caption indicates I am trying to match multiple-lines of logs of which I would like to save first occurence of timestamp to separate field than last occurence of the timestamp. I am using codec multiline to separate tasks from big log file and (?m) regex before timestamp. I need to do this to measure time between process completion. Any ideas how this could be done ?

2020-12-16 15:43:31.605  INFO 18020 --- [http-nio-8080-exec-3] c.n.w.workflow.service.DataService       : Getting groups of task 5fda1d109ceec746643760f8 in case 11.11.2020 13:20 level: 0
2020-12-16 15:43:34.346  INFO 18020 --- [http-nio-8080-exec-1] c.n.w.workflow.service.TaskService       : [5fda1d109ceec746643760f5]: Task [GENERATE] in case [11.11.2020 13:20] assigned to [super@netgrif.com] was finished

I would like the output to be
Start: 15:43:31.605
Severity: INFO
INT: 18020
Thread: http-nio-8080-exec-3
Class: c.n.w.workflow.service.TaskService
GREEDYDATA: [...,...,...]
End: 15:43:34.346

I was thinking of removing all items from array of timestamps generated by (?m) between first and last and then separate it into two different fields but Im not really sure how to do so.

Badger · February 20, 2021, 9:19pm

grok { match => { "message" => "\A%{TIMESTAMP_ISO8601:start}.*^%{TIMESTAMP_ISO8601:end}[^\n]*\Z" } }

\A anchors the first timestamp to the beginning of the message field. Then for the end time you anchor it to start of line using ^ and the [^\n]*\Z means there cannot be another newline from there to the very end of the text.

Daniel_Jankech · February 20, 2021, 9:46pm

Yes! Thank you , that is exactly what I was looking for. There should be special awards for people like you helping out others at Saturday nights:)

Daniel_Jankech · February 21, 2021, 10:07am

I just tried to use it incorporated into my pattern , but im not quite getting the output that I would like to be getting from GREEDYDATA. For some reason im getting single value rather than array of values from all the GREEDYDATA values.
My entire pattern :
(?m)\A%{TIMESTAMP_ISO8601:start}.* %{SPACE} %{LOGLEVEL:LEVEL} %{INT:NUMBER} --{2} \[%{DATA:THREAD}] %{DATA:CLASS}\s(?m)%{GREEDYDATA:message}^%{TIMESTAMP_ISO8601:end}[^\n]*\Z

Is my usage correct ?
I thought (?m) before greedy data indicates multiline input and that would result into an array of values.

Badger · February 21, 2021, 6:03pm

grok will not return an array of matches. If you need multiple matches for a single pattern then use a ruby filter and the String .scan function. There is an example of doing that here.

Daniel_Jankech · February 21, 2021, 8:48pm

Sorry I didnt express myself correctly. I would be totally fine with the output that is in top answer here in the field "extralines". I thought this could be done with simple (?m) usage.

Badger · February 21, 2021, 9:24pm

It works for me. If I start with

   "message" => "2020-12-16 15:43:31.605  INFO 18020 --- [http-nio-8080-exec-3] c.n.w.workflow.service.DataService       : Getting groups of task 5fda1d109ceec746643760f8 in case 11.11.2020 13:20 level: 0\nFoo\n    Bar\n2020-12-16 15:43:34.346  INFO 18020 --- [http-nio-8080-exec-1] c.n.w.workflow.service.TaskService       : [5fda1d109ceec746643760f5]: Task [GENERATE] in case [11.11.2020 13:20] assigned to [super@netgrif.com] was finished"

then

     grok { match => { "message" => "(?m)\A%{TIMESTAMP_ISO8601:start}.* %{SPACE} %{LOGLEVEL:LEVEL} %{INT:NUMBER} --{2} \[%{DATA:THREAD}] %{DATA:CLASS}\s(?m)%{GREEDYDATA:message}^%{TIMESTAMP_ISO8601:end}[^\n]*\Z" } }

results in

       "end" => "2020-12-16 15:43:34.346",
     "CLASS" => "c.n.w.workflow.service.DataService",
     "LEVEL" => "INFO",
   "message" => [
    [0] "2020-12-16 15:43:31.605  INFO 18020 --- [http-nio-8080-exec-3] c.n.w.workflow.service.DataService       : Getting groups of task 5fda1d109ceec746643760f8 in case 11.11.2020 13:20 level: 0\nFoo\n    Bar\n2020-12-16 15:43:34.346  INFO 18020 --- [http-nio-8080-exec-1] c.n.w.workflow.service.TaskService       : [5fda1d109ceec746643760f5]: Task [GENERATE] in case [11.11.2020 13:20] assigned to [super@netgrif.com] was finished",
    [1] "      : Getting groups of task 5fda1d109ceec746643760f8 in case 11.11.2020 13:20 level: 0\nFoo\n    Bar\n"
],

etc. Note that the array of arrays is just a presentation thing at https://grokdebug.herokuapp.com/. Even in that SO answer you linked to, the GREEDYDATA is matching a single string.

If you want to get each line in a separate array entry then use mutate+split.

Daniel_Jankech · February 22, 2021, 11:39am

Im not sure whats causing this but I cant seem to get the same output as you

This still results in

sorry for the screenshot I hope its good enough quality to see the content.

I copied your input message and pattern just to be 100% sure im not screwing up anywhere myself. Thanks for the tip for mutate + split I will certainly do that after I figure this out.

Badger · February 22, 2021, 2:29pm

That is exactly what I would expect you to get. What do you think should be different?

Daniel_Jankech · February 22, 2021, 2:42pm

Sorry i re-read your answer and I thought I saw something different in output , I get why Im getting a single value now. Is it possible for me to get ALL the GREEDYDATA values in a single field ? Thanks:)

Badger · February 22, 2021, 2:57pm

That is what you get by default. I do not understand what you want that is different from what you are getting.

Daniel_Jankech · February 22, 2021, 3:08pm

I mean greedydata from ALL lines accumulated in a single field.

system · March 22, 2021, 3:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do I match a newline in grok/logstash Logstash	13	16489	July 6, 2017
Logstash grok multiple pattern , multi-line Logstash	2	302	March 19, 2021
Multiline log Logstash	6	891	July 6, 2017
Multiline pattern on basis of timestamp Beats filebeat	6	9514	July 31, 2018
Match and extract last part of log in pipe delimited log Logstash	3	2197	July 19, 2017

Grok matching multi-lines saving first and last value to separate fields

Related topics