Avoiding Field Reference Grammer in JSON Parsing

Hey all,

I have been processing stringified JSON using the json filter plugin for some time:

json {
    skip_on_invalid_json => true
    source => "data"
}

The JSON being parsed is dynamically generated by another system. Occasionally, I run into the case where the stringified JSON contains keys that match the Field Reference Grammar. (e.g. {"some[key]": "value"}). While not ideal, this is technically valid JSON. Unfortunately, since the json filter plugin uses event.set, these keys will be interpreted as fieldNames when the event is updated with the decoded JSON.

Prior to 7.0.0, a field reference to something like some[key] would give a warning, but the JSON would still be decoded and the subsequent object manipulated gracefully. Now, withSTRICT as the only available Field Reference grammar option in 7.0.0, a value such as some[key] is illegal and results in all-out pipeline failure.

I must solve this problem from logstash as I have no control of the incoming data. My possible solutions were to:

  1. Attempt to sanitize the encoded JSON string prior to decoding (slow)
  2. Use a ruby filter or custom plugin to sanitize the decoded json prior to applying it to the event (faster than 1, but more to maintain)
  3. Look into updating the json filter to account for this collision

At the end of the day, I am trying to decode a valid JSON string, so 1 and 2 seem pretty hacky for a common use case. At first glance, I don't think 3 can be accomplished without also extending the event API (or avoiding it which is also not a good idea).

Is there a simple solution that I am missing? Do I have any other options to decode a JSON string without keys accidentally getting interpreted by the Field Reference Grammar? I still need access to elements in the parent object, is there any way I can keep those around and use the json codec plugin?

Any input would be appreciated! Thanks!

3 Likes

Same here. It used to be possible to mutate field names containing [ and/or ] into something acceptable, but now it became impossible. Pipeline rejects input before a mutating filter is triggered. Any suggestion on how we are supposed to deal with this in 7.0?

2 Likes

We are having the exact same issue:

[ERROR][logstash.codecs.json ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Invalid FieldReference: somefield[other_field]>

1 Like

Not much movement on this :frowning:

Since it seems to be an issue for others, I thought I would post a temporary fix using a ruby filter (ala option 1 and 2 above). This is a temporary strategy, that may or may not work depending on your use case. At least it will stop pipeline crashes in logstash 7.0.0 with forced strict FR Grammar.

This filter will JSON parse a JSON encoded string in the [data] field and strip [ or ] from any Hash keys and set the parsed object back to the [data] field:

filter {      
  ruby {
    code => "
      def sanitize_field_reference(item)
        case item
        when Hash
          item.keys.each{ |k| item[k.gsub(/[\[\]]/, '')] = sanitize_field_reference(item.delete(k)) }
          return item
        when Array
          return item.map { |e| sanitize_field_reference(e) }
        else
          return item
        end
      end

      event.set('[data]', sanitize_field_reference(JSON.parse(event.get('[data]'))))"
  }
}

Would still welcome a more maintainable or production worthy answer regarding decoding valid JSON objects that contain Field Reference Grammar tokens.

Cheers,

Hal

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.