Create temp metadata field using regex in message field and parse json

I need to extract the JSON object from the message field

filter {
    grok { 
        match => { "message" => "(?<[@metadata][tempjson]>{.+})" } 
    }
    json {
        source => "[@metadata][tempjson]"
    }
}

Above filter works in 7.8 but in production, we are on 6.4.1 and above filter throws the following error:

Aug 11 10:49:25 logstash[24400]: [2020-08-11T10:49:25,354][ERROR][logstash.pipeline ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::FilterDelegator:0x7e242ff9 @metric_events_out=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: out value:0, @metric_events_in=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: in value:0, @metric_events_time=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: duration_in_millis value:0, @id="0b558f1ec526e9beddd9873771d6477a6f55d61976c1fb340737ebd85e3e5120", @klass=LogStash::Filters::Grok, @metric_events=#LogStash::Instrument::NamespacedMetric:0x32a505ed, @filter=<LogStash::Filters::Grok match=>{"message"=>"(?<[@metadata][tempjson]>{.+})"}, id=>"0b558f1ec526e9beddd9873771d6477a6f55d61976c1fb340737ebd85e3e5120", enable_metric=>true, periodic_flush=>false, patterns_files_glob=>"*", break_on_match=>true, named_captures_only=>true, keep_empty_captures=>false, tag_on_failure=>["_grokparsefailure"], timeout_millis=>30000, tag_on_timeout=>"_groktimeout">>", :error=>"invalid char in group name <[@metadata][tempjson]>: /(?<[@metadata][tempjson]>{.+})/m", :thread=>"#<Thread:0x2adaf67d run>"}

Aug 11 10:49:25 logstash[24400]: [2020-08-11T10:49:25,607][ERROR][logstash.pipeline ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<RegexpError: invalid char in group name <[@metadata][tempjson]>: /(?<[@metadata][tempjson]>{.+})/m>, :backtrace=>["org/jruby/RubyRegexp.java:928:in initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:127:in compile'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:281:in block in register'", "org/jruby/RubyArray.java:1734:in each'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:275:in block in register'", "org/jruby/RubyHash.java:1343:in each'", "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:270:in register'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:242:in register_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:253:in block in register_plugins'", "org/jruby/RubyArray.java:1734:in each'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:253:in register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:595:in maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:263:in start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:200:in run'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:160:in `block in start'"], :thread=>"#<Thread:0x2adaf67d run>"}

Does it work in 6.4.1 if you just put the field at the top level instead of trying to put it under [@metadata], that is

"(?<tempjson>{.+})"

Yup thats what I'm doing right now, and after parsing I'm removing that temp field, but I'm hoping to avoid step where I need to remove this temp field. Metadata is exactly what I need but its not working in production.

If you add 'remove_field => [ "tempjson" ]' to the json filter it will be left on the event only if it is not valid JSON. That might be useful.

I'm adding remove_field => ["tempjson"] after parsing like this:

      grok { 
        match => { "message" => "(?<tempjson>{.+})" } 
      }

      json {
        source => "tempjson"
      }

      mutate {
        remove_field => ["tempjson"]
      }

That will work, but it means the temporary field is unconditionally removed, and if there is ever JSON that fails to parse you will not be able to see what it is. That is why I suggested moving the remove_field => ["tempjson"] to the json filter.

1 Like

Sure makes sense, I will do that thank you :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.