I am using grok. I am stuck with an issue as mentioned below. I am facing issue when there is multiple colon ":" present in the logs . It is looking for GREEDYDATA and is not returning the first matching ":"
I am using custom pattern like this where jstkeyname and jstkeyvalue return like key value pair. I am getting it wrong for "Job URL" and "Source" as it is having additional : in the log . Wondering what I am doing wrong here. I tried macking : nongreedy bu adding ? infront of : %{SJTKEYNAME:jstkeyname}:?%{SJTKEYVALUE} but is not working
This is a good example of why one must be careful with multiple GREEDYDATA and/or DATA patterns. Your GREEDYDATA in SJTKEYNAME will indeed be greedy and match up until the last colon which for some lines happens to be after "http:".
You should always use as strict boundary conditions as you can. The key in each line doesn't consist of "any number of arbitrary characters" (a reasonable interpretation of GREEDYDATA), it's rather "any number of arbitrary characters except colon". If we translate that back into a grok pattern we get this:
SJTKEYNAME %{NOTSPACE}[^:]*
Well, technically this pattern requires a non-empty key (because of the leading NOTSPACE) but that's probably a good idea anyway.
Secondly, why include %{SPACE} in SJTKEYVALUE? The space isn't really part of the value is it?
Thanks for your response. I tried above and it fixed the issue with JOB URL, but the issue still persists with Source: Mercurial:
Ah, right. NOTSPACE means \S+ so just like GREEDYDATA it can match more than expected (but not any character, just non-whitespace characters). Remove it from the pattern. It's not useful.
Regarding %{SPACE}, there is space between : and the value I just wanted to get rid of space, hence i gave it that way. Is it not the right approach?
The problem is that you're making the space part of the SJTKEYVALUE pattern, so %{SJTKEYVALUE:foo} will include any leading space in the captured foo field. It's probably better to treat the whitespace as a separation just like the colon.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.