Help with Grok pattern

I am using grok. I am stuck with an issue as mentioned below. I am facing issue when there is multiple colon ":" present in the logs . It is looking for GREEDYDATA and is not returning the first matching ":"

I am using custom pattern like this where jstkeyname and jstkeyvalue return like key value pair. I am getting it wrong for "Job URL" and "Source" as it is having additional : in the log . Wondering what I am doing wrong here. I tried macking : nongreedy bu adding ? infront of : %{SJTKEYNAME:jstkeyname}:?%{SJTKEYVALUE} but is not working

SJTKEYNAME %{NOTSPACE}%{GREEDYDATA}
SJTKEYVALUE %{SPACE}%{GREEDYDATA:jstkeyvalue}
%{SJTKEYNAME:jstkeyname}:%{SJTKEYVALUE}

LOG

Job ID:        2016-01-26-081744.xycdc,sdfsdf
JST System:        sowest
JST Version:       3.3.245: (2016-01-25) Case of the Arrogant Arsonist 
Job URL:    http://scxxxx.yfc.com/archive/2016/01/2016-01-26
Job ARCHIVE:      /net/scxxxx.yfc.com/export/archives/data/jst/archive/2016/
User:             tohartma - tobias.hartmann@umanglob.com
Release:            mdk9
Source:             Mercurial: /umanglob/{.,mdk,jaxp,pubs,corba,jaxws,deploy}
File List:          {.}

OUTPUT FROM GROK MATCHER

Job URL: http://scxxxx.yfc.com/archive/2016/01/2016-01-26-081744.sads,/fd.comp
MATCHED
jstkeyname	JobยทURL:ยทยทยทยทยทยทยทยทยทยทยทยทhttp
jstkeyvalue	//scxxxx.yfc.com/archive/2016/01/2016-01-26-081744.sads,/fd.comp


Source: Mercurial: /umanglob/gk/hs-comp/{.,mdk,jaxp,pubs,corba,jaxws,deploy}
MATCHED
jstkeyname	Source:ยทยทยทยทยทยทยทยทยทยทยทยทยทMercurial
jstkeyvalue	/umanglob/gk/hs-comp/{.,mdk,jaxp,pubs,corba,jaxws,deploy}

This is a good example of why one must be careful with multiple GREEDYDATA and/or DATA patterns. Your GREEDYDATA in SJTKEYNAME will indeed be greedy and match up until the last colon which for some lines happens to be after "http:".

You should always use as strict boundary conditions as you can. The key in each line doesn't consist of "any number of arbitrary characters" (a reasonable interpretation of GREEDYDATA), it's rather "any number of arbitrary characters except colon". If we translate that back into a grok pattern we get this:

SJTKEYNAME %{NOTSPACE}[^:]*

Well, technically this pattern requires a non-empty key (because of the leading NOTSPACE) but that's probably a good idea anyway.

Secondly, why include %{SPACE} in SJTKEYVALUE? The space isn't really part of the value is it?

1 Like

Hi Magnus,
Thanks for your response. I tried above and it fixed the issue with JOB URL, but the issue still persists with Source: Mercurial:

  Source: Mercurial: /umanglob/{.,mdk,jaxp,pubs,corba,jaxws,deploy}
 MATCHED
 jstkeyname	Source:ยทยทยทยทยทยทยทยทยทยทยทยทยทMercurial
 jstkeyvalue	/umanglob/{.,mdk,jaxp,pubs,corba,jaxws,deploy}

Regarding %{SPACE}, there is space between : and the value I just wanted to get rid of space, hence i gave it that way. Is it not the right approach?

Thanks for your response. I tried above and it fixed the issue with JOB URL, but the issue still persists with Source: Mercurial:

Ah, right. NOTSPACE means \S+ so just like GREEDYDATA it can match more than expected (but not any character, just non-whitespace characters). Remove it from the pattern. It's not useful.

Regarding %{SPACE}, there is space between : and the value I just wanted to get rid of space, hence i gave it that way. Is it not the right approach?

The problem is that you're making the space part of the SJTKEYVALUE pattern, so %{SJTKEYVALUE:foo} will include any leading space in the captured foo field. It's probably better to treat the whitespace as a separation just like the colon.

1 Like