Broken Grok filter since V7.10.0

Hi ,
I've recently upgraded logstash from V7.6.0 to V7.16.3. and have seen some odd behaviors of Grok filter since then.

filter {
        grok {
                patterns_dir => "C:\software\ELK\config\patterns"

                        match=> { "message" => "%{CFLOG}" }
                        add_tag => "has_traceback"
                        tag_on_failure=>[]
                        break_on_match=>false
                        match=> { "uri_stem" =>"%{REQUESTURL}"}
                        match=> {"uri_stem" =>"%{DATATYPE}"}
                        match=> {"atime" => "%{STIME}"}
CFLOG ^%{CFDATE:adate}\s+%{TIME:atime}\s+%{NOTSPACE:x_edge_location}\s+%{NUMBER:bytes}\s+%{IP:clientip}\s+%{WORD:verb}\s+%{NOTSPACE}\s+%{NOTSPACE:uri_stem}\s+%{NUMBER:response}\s+%{NOTSPACE:referer}\s+%{NOTSPACE:agent}\s+%{NOTSPACE:uri_query}\s+%{NOTSPACE:cookies}\s+%{NOTSPACE:x_edge_result_type}\s+%{NOTSPACE}\s+%{NOTSPACE}\s+%{NOTSPACE:cs_protocol}\s+%{NUMBER:cs_bytes}\s+%{NOTSPACE:time_taken}\s+%{NOTSPACE:x_forwarded_for}\s+%{NOTSPACE:ssl_protocol}\s+%{NOTSPACE:ssl_cipher}\s+%{WORD:x_edge_response_result_type}\s+%{NOTSPACE:cs_protocol_version}

REQUESTURL /(%{URLFRAGMENT:lang}/)?%{LEGTYPE:legtype}/%{LEGYEAR:legyear}/%{URLFRAGMENT:legnum}?%{GREEDYDATA:legrest}

it appears that above rules worked for all versions prior to V.7.10.0. However, the rule match=> { "uri_stem" =>"%{REQUESTURL}"} failed to match and returned empty matches in any version between V7.10.0 and V7.16.3.
I am wondering if there was any breaking changes of Grok syntax at V7.10.0?
I also tried grok rules in Kibana's grok debugger (V7.16.3), all works fine.

Regards,
Landong

You are making an assumption about the order in which multiple occurrences of the match option are combined. logstash does not guarantee the order and l know it changed a while back. From first-to-last to last-to-first (I think), so grok is trying to match [uri_stem] before it has been created by matching [message]

As far as I know, order is preserved within a single hash (i.e. a single occurrence of the match option) but the order in which multiple hashes are merged is not defined.

So, do not use multiple occurrences. I would suggest trying

    add_tag => "has_traceback"
    tag_on_failure=>[]
    break_on_match=>false
    match=> { 
        "message" => "%{CFLOG}"
        "uri_stem" =>"%{REQUESTURL}"
        "uri_stem" =>"%{DATATYPE}"
        "atime" => "%{STIME}"
    }

If that does not work then split it into two groks - one for [message] and another for the rest.

Thanks Badger,
I did split it into two groks, which produced the expected outcome.

Regards,

Landong

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.