Standard grok patterns consisting of only a group

The documentation for the Grok filter states that to add a pattern to a custom patterns file, you:

  • write the pattern you need as the pattern name, a space, then the regexp for that pattern.

For example, doing the postfix queue id example as above:

# contents of ./patterns/postfix:
POSTFIX_QUEUEID [0-9A-F]{10,11}

But when I look at some of the patterns that ship with Logstash, I see:

#Space is an allowed character to match special cases like 'Native Method' or 'Unknown Source'
JAVAFILE (?:[a-zA-Z$_0-9. -]+)
#Allow special <init>, <clinit> methods
JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*)
#Line number is optional in special cases 'Native method' or 'Unknown source'
JAVASTACKTRACEPART %{SPACE}at %{JAVACLASS:[java][log][origin][class][name]}\.%{JAVAMETHOD:[log][origin][function]}\(%{JAVAFILE:[log][origin][file][name]}(?::%{INT:[log][origin][file][line]:int})?\)
# Java Logs
JAVATHREAD (?:[A-Z]{2}-Processor[\d]+)
JAVALOGMESSAGE (?:.*)

A lot of the patterns are wrapped in an extended group (?:...).

This confused the heck out of me. Was I supposed to do this in my own custom patterns as well? Why were they doing this some patterns (JAVAFILE, JAVAMETHOD) but not in others (JAVASTACKTRACEPART)?

After some trial and error in the Grok Debugger I've confirmed that wrapping the Grok expressions (which are already regular expressions) in an extended group is completely superfluous.

Can we get rid of this then and keep these expressions as simple as possible and avoid confusing people new to Logstash like myself?

Or am I missing something here?

Thanks,
Frans

(?: ) is a non-capturing group, meaning you cannot back-reference it. I cannot think of a circumstance where that will matter for grok because it is rare to use backreferences, so I wouldn't include it in my own patterns.

For example, the following works

input { generator { count => 1 lines => [ 'and .... slow slow quick quick slow' ] } }
output { stdout { codec => rubydebug { metadata => false } } }
filter {
    grok { match => { "message" => "\.\. (?<foo>%{WORD}) \k<foo> %{WORD} %{WORD} \k<foo>" } }
}

Changing the message to "and .... slow quick quick quick slow" gets a _grokparsefailure, so you clearly can use a named back-reference within the pattern (you cannot use a numeric back-reference).

If you change the pattern to (?:%{WORD}) it tells you that foo is undefined and cannot be backreferenced.