Standard grok patterns consisting of only a group

frans-wtax · July 14, 2025, 1:58pm

The documentation for the Grok filter states that to add a pattern to a custom patterns file, you:

write the pattern you need as the pattern name, a space, then the regexp for that pattern.

For example, doing the postfix queue id example as above:
# contents of ./patterns/postfix:
POSTFIX_QUEUEID [0-9A-F]{10,11}

But when I look at some of the patterns that ship with Logstash, I see:

#Space is an allowed character to match special cases like 'Native Method' or 'Unknown Source'
JAVAFILE (?:[a-zA-Z$_0-9. -]+)
#Allow special <init>, <clinit> methods
JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*)
#Line number is optional in special cases 'Native method' or 'Unknown source'
JAVASTACKTRACEPART %{SPACE}at %{JAVACLASS:[java][log][origin][class][name]}\.%{JAVAMETHOD:[log][origin][function]}\(%{JAVAFILE:[log][origin][file][name]}(?::%{INT:[log][origin][file][line]:int})?\)
# Java Logs
JAVATHREAD (?:[A-Z]{2}-Processor[\d]+)
JAVALOGMESSAGE (?:.*)

A lot of the patterns are wrapped in an extended group (?:...).

This confused the heck out of me. Was I supposed to do this in my own custom patterns as well? Why were they doing this some patterns (JAVAFILE, JAVAMETHOD) but not in others (JAVASTACKTRACEPART)?

After some trial and error in the Grok Debugger I've confirmed that wrapping the Grok expressions (which are already regular expressions) in an extended group is completely superfluous.

Can we get rid of this then and keep these expressions as simple as possible and avoid confusing people new to Logstash like myself?

Or am I missing something here?

Thanks,
Frans

Badger · July 14, 2025, 4:28pm

(?: ) is a non-capturing group, meaning you cannot back-reference it. I cannot think of a circumstance where that will matter for grok because it is rare to use backreferences, so I wouldn't include it in my own patterns.

For example, the following works

input { generator { count => 1 lines => [ 'and .... slow slow quick quick slow' ] } }
output { stdout { codec => rubydebug { metadata => false } } }
filter {
    grok { match => { "message" => "\.\. (?<foo>%{WORD}) \k<foo> %{WORD} %{WORD} \k<foo>" } }
}

Changing the message to "and .... slow quick quick quick slow" gets a _grokparsefailure, so you clearly can use a named back-reference within the pattern (you cannot use a numeric back-reference).

If you change the pattern to (?:%{WORD}) it tells you that foo is undefined and cannot be backreferenced.

Topic		Replies	Views
Capturing group inside a custom grok pattern Logstash	4	3878	February 14, 2019
Oniguruma patern will not work in grok debuger Logstash	4	737	January 27, 2020
Defining grok patterns Logstash	22	16272	July 6, 2017
Making a part in the grok expression optional Logstash	2	30266	July 6, 2017
Grok filter pattern file with capture group regex Logstash	2	1280	September 6, 2017

Standard grok patterns consisting of only a group

Related topics