Warning Message in Logstash 8: Redundant Nested Repeat Operator in Regular Expression

Hello Elastic community,

I've encountered an issue after migrating to Logstash version 8. I'm seeing a new warning message related to a regular expression. Here are the details:

Issue:
After the migration, I'm receiving the following warning message:

/logstash-8.15/vendor/bundle/jruby/3.1.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:127: warning: regular expression has redundant nested repeat operator *

Problematic Regular Expression:

The warning appears to be related to a complex regular expression used for parsing syslog messages. Due to its length and complexity, I'll provide it in a separate comment below this post.

Relevant Code:
This regex is part of a larger Ruby class that handles Grok pattern compilation. Here's the relevant part of the code:


    if [type] in ["syslog", "snmp_trap_syslog"]
    {
        grok {
            patterns_dir => ["./patterns"]
            match => {
                "message" => [ 
                    "<%{NONNEGINT:syslog_pri}>%{NONNEGINT:syslog_version}%{SPACE}%{TIMESTAMP_ISO8601:syslog_timestamp_original}%{SPACE}(?:-|%{IPORHOST:syslog_host})%{SPACE}(?:-|%{SYSLOG5424PRINTASCII:syslog_program})%{SPACE}(?:-|%{SYSLOG5424PRINTASCII:syslog_process_id})%{SPACE}(?:-|%{SYSLOG5424PRINTASCII:syslog_message_id})%{SPACE}(?:- |)%{SPACE}(?:%{GREEDYDATA:syslog_message_rfc5424}|)" , 
                    
                    "(<%{NONNEGINT:syslog_pri}>)?(%{SPACE})?(%{SYSLOGTIMESTAMP:syslog_timestamp_original}|%{TIMESTAMP_ISO8601:syslog_timestamp_original})%{SPACE}(?:-|%{IPORHOST:syslog_host})%{SPACE}?%{GREEDYDATA:syslog_message_rfc3164}" ,

                    "(<%{NONNEGINT:syslog_pri}>)?(%{SPACE})?(%{SYSLOGTIMESTAMP:syslog_timestamp_original}|%{TIMESTAMP_ISO8601:syslog_timestamp_original})?(%{SPACE})?%{GREEDYDATA:syslog_message_rfc3164}"
                ]
            }
        }```

Expression :
vendor/bundle/jruby/3.1.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:127: warning: regular expression has redundant nested repeat operator * 
/(<(?<NONNEGINT:syslog_pri>\b(?:[0-9]+)\b)>)?((?:\s*))?((?<SYSLOGTIMESTAMP:syslog_timestamp_original>(?:\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)
?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b)
 +(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])) (?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])))
 |(?<TIMESTAMP_ISO8601:syslog_timestamp_original>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9]))
 :?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?))(?:\s*)(?:-|(?<IPORHOST:syslog_host>
 (?:(?:(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d))
 {3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}
 (((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}
 (((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}
 (((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}
 (((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4})
 {1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?)|(?:(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}
 |2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))))
 |(?:\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)))))(?:\s*)?(?<GREEDYDATA:syslog_message_rfc3164>.*)/
grok { match => { "message" => "%{SPACE}?" } }

will result in the error message

warning: regular expression has redundant nested repeat operator * /(?:\s*)?/

SPACE will match zero or more whitespace characters, so the whitespace is optional. The ? or ()? is redundant and you can remove it in all four cases.

^foo%{SPACE}%{NUMBER:word} will match foo1, but note that ^foo%{SPACE}%{WORD:word} will not match foobar because WORD requires a word boundary at each end, and o is not a word boundary.

2 Likes

@Badger Its working. Thank you for this useful message. Very helpful. resolved 2 days of struggle. Thanks