Grok Patterns - issue using repetitive regular expressions


#1

Hey guys,

I´ve been using the ELK-Stack for a while and it really is a great tool. Right now I am confronted with a problem, which I try solving for a while.

I wrote a custom SAP pattern to match SAP-Taks from the Logs of my company. They work pretty well. The problem is, there are logevents, where 1-n SAP-Tasks occure consecutively. The patterns repeats the following.

{SAP},.{SAP},. ........ {SAP}
(1-n) (always one time in the end without a comma)
So the first 1 to (n-1)-times the tasks are seperated by a comma and a space and after the closing sap-task there isn´t any comma nor space.

Firstly I thought I can solve this with the ()-Operator or maybe a pattern for the first SAP-Pattern with comma and then one pattern without the comma. I tried regular expression like the following:
(%{SAP},.){1,}%{SAP} -> {1,} basically means, that the patterns occurs 1 to x times.
(%{SAP},.)
)%{SAP} -> sap with comma and space n times, then one closing sap-task with neither comma nor space

So far so good.
When the input contains two consecutive SAP-tasks this patterns works very well. Everything is matched correctly. The problem occurs when there are more SAP-tasks in the input than in my grok-pattern.
Grok seems to somehow hide or eat up the other values from the remaining sap-tasks. These values can´t be found anywhere.
If I hardcore it, counting the consecutive SAP-tasks from the input and just copy&paste the SAP-pattern as often as it occurs in the input ,it works perfectly and all values are matched to the specific fields.
One could say, well just hardcode the pattern 50 times in your grok pattern and your fine. There is the next problem. If my custom grok pattern contains just one SAP-Pattern more than there are SAP-Tasks in my input, the filter won´t match at all.
To specify this case as example.

Input:
{SAP},.{SAP}
Pattern:
%{SAP},.%{SAP},.%{SAP}

-> no match at all

I hope you guys can understand what I try to explain, as it is very specific. If I have to be more detailled, just hit me up. I´m having a huge interest solving this problem :).

Thank you in advance.


(M1k3ga) #2

Hi Marv,
i hope, i understand your problem correct.
You have a log line like
"SAPvalue, SAPvalue, SAPvalue, ..., SAPvalue SAPvalue" (comma separated and at the end w/o comma).

A possible pattern could be
SAP ([A-Za-z])
SAP_MULTI (%{SAP},?\s)

For an example line like
SAPvalue, SAPvalue, SAPvalue, SAPvalue SAPvalue blubb bla
you get as result:
"SAP_MULTI": [ [ "SAPvalue, SAPvalue, SAPvalue, SAPvalue SAPvalue file path " ] ],

I hope that helps


#3

Hi,

thanks for your answer.
Unfortunately it is not exactly what I meant. I could probably have used the csv-filter for data like you mentioned.

I´ll try to be more specific. Data I try to pass looks exactly like this.

someTaskType someId someId taskName -- user -- timeAsString, taskState, duration, someId

the pattern I use is:

SAP %{WORD:someTaskType}{1}.%{WORD:someId}{1}.%{WORD:someId2}{1}.%{DATA:taskName}{1}.--.%{WORD:user}{1}.--.%{DATA:timeAsString}{1},.%{WORD:taskState}{1},.%{DATA:duration}{1},.%{WORD:someId3}{1}

Note: I know I should try not to use the DATA-Pattern but it is absolutly necessary as for example user names can occure in the logs in different formats e.g. surname, lastname or lastname surname or surname.lastname
So I have no other choice then fetch this data with the DATA-Pattern.

As I mentioned this works perfectly as long as I hardcode it. What I mean by that is that I count the tasks occuring in the input and alter my pattern depending on this number. E.g. 3 different sap tasks. Note every new sap task start with someTasktype:

someTaskType someId someId taskName -- user -- timeAsString, taskState, duration, someId, someTaskType someId someId taskName -- user -- timeAsString, taskState, duration, someId, someTaskType someId someId taskName -- user -- timeAsString, taskState, duration, someId

So if I count 3 tasks the pattern is.

%{SAP},.%{SAP},.%{SAP}

this matches perfectly. If I try to do it with the (*)-Operator the output only contains the patterns I harcoded. The same thing happens when I try your solution. It matches the two last tasks but ignores the task, which occure before them. It is pretty akward.

Thanks in advance.


(system) #4