Grok fails on a valid regex

Log:

    [logstash.javapipeline    ][main] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<RegexpError: unmatched close parenthesis: /(?<loglevel>DBG|INF|WRN)(?(?=(?:.*(?:X-Request-Id|CorrelationId)))(?:.*)(?<=X-Request-Id=.|CorrelationId=.)(?<request_id>[^ ]*))/m>, :backtrace=>["org/jruby/RubyRegexp.java:942:in `initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:127:in `compile'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-grok-4.2.0/lib/logs
     tash/filters/grok.rb:284:in `block in register'", "org/jruby/RubyArray.java:1814:in `each'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-grok-4.2.0/lib/logstash/filters/grok.rb:278:in `block in register'", "org/jruby/RubyHash.java:1428:in `each'", "/usr/sha
     re/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-grok-4.2.0/lib/logstash/filters/grok.rb:273:in `register'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:56:in `register'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:200:in `block
      in register_plugins'", "org/jruby/RubyArray.java:1814:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:199:in `register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:502:in `maybe_setup_out_plugins'", "/usr/share/logstash/
     logstash-core/lib/logstash/java_pipeline.rb:212:in `start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:154:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:109:in `block in start'"], "pipeline.sources"=>["/usr/share/logstas
     h/pipeline/logstash.conf"], :thread=>"#<Thread:0x2c530ae9 run>"}

Regex in question:

(?<loglevel>DBG|INF|WRN)(?(?=(?:.*(?:X-Request-Id|CorrelationId)))(?:.*)(?<=X-Request-Id=.|CorrelationId=.)(?<request_id>[^ ]*))

Regex is valid and can be tested in https://regex101.com/ fro example.

Test log to test on:

10:58AM DBG gateway-auth -> gateway-bff eX-Request-Id=e2d071784d6a1468f2b91fcf4973e806f eExchange=egateway eReplyTo=e eRoutingKey=egateway-bff ehandler=e"RQM message published"

Any suggestions?

Hi!

well, talking about regex, a pattern can be valid... depending on the supported regex flavour :thinking:

Your example seems a valid PCRE regex, but grok uses JRuby and onigurima regex engine, so you should test first in other tools like grok Debugger or grok constructor.

I found that neither of them show more detailed information than the logstash output you got, so another good try is "rubular". Your original regex throws "Invalid conditional pattern" .

Adding a colon to the conditional block seems to work: (?:(?= instead of (?(?=
Check it by yourself: Rubular: (?<loglevel>DBG|INF|WRN)(?:(?=(?:.*(?:X-Request-Id|CorrelationId)))(?:.*)(?<=X-Request-Id=.|CorrelationId=.)(?<request_id>[^ ]*))


Having said that, It looks a bit overcomplicated expression (conditionals and lookbehind...) just to extract a couple of strings, I would advise trying a simpler approach from scratch.

It should also gain a lot of efficiency by adding an anchor to the begining of the line, otherwise it will iterate a bazillion times from the begining to the end of the log message looking for the terms contained in your pattern.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.