Conditional formatting of JSON or serialized JSON

Apologies if this is obvious. I'm pulling Open edX logs from AWS CloudWatch using the cloudwatch_logs plugin (awesome!).

I have an "event" key in my incoming JSON, which contains arbitrary data: sometimes the value is a full JSON object and sometimes it's a string with serialized JSON (a quirk of Open edX logging).

So an incoming log line can look like this:

{ 
  "blah": "something",
  "event": { 
      "this": "that"
  }
}

And sometimes it can look like this:

{
  "blah": "something",
  "event": "{\"url\": \"some-url\"}"
}

Is there an approach anyone can recommend? Do I need to use conditionals?

Thanks in advance for any thoughts!

I tried to use a json filter, but that then seems to cause errors with actual json:


input {
    cloudwatch_logs {
        log_group => [ "/edx/var/log/tracking" ]
        access_key_id => "(my key)"
        secret_access_key => "(my secret)"
        region => "us-west-1"
        codec => "json"
    }
}
filter {
  json {
    source => "event"
    target => "event"
  }
}
output {
    elasticsearch {
        hosts =>  [ "localhost:9200" ]
        index => "tracking"
    }
}

Some incoming logs fail with a

[2017-10-24T22:41:29,045][WARN ][logstash.filters.json    ] Error parsing json {:source=>"event", :raw=>{}, :exception=>java.lang.ClassCastException: org.jruby.RubyHash cannot be cast to org.jruby.RubyIO}

Or

[2017-10-24T23:11:03,256][WARN ][logstash.filters.json    ] Error parsing json {:source=>"event", :raw=>{"fullname"=>"Someone McStudent", "user_id"=>463, "email"=>"someone@example.com", "username"=>"someone"}, :exception=>java.lang.ClassCastException}

I tried a filter like this so that I'm only trying to encode strings into JSON....but doesn't seem to work (logstash won't start properly, so perhaps there's a formatting issue in my filter):

filter {
    ruby {
        code => "
            if event['event'] is_a? String
                event.set['event_raw'] = event['event']
        "
    }
    json {
        source => "event_raw"
        target => "event"
        remove_field => "event_raw"
    }
}

You're on the right track in your second approach but there are multiple problems with your Ruby code. This should work (assuming Logstash 5+):

if event.get('event').is_a? String
  event.set('event_raw', event.get('event'))
end

Awesome. Thanks Magnus. I'll give that a shot.

Strange because now when I run logstash, it keeps sending the same set of message to the log every 20 seconds or so, but nothing to logstash.err and nothing saved to ES



[2017-10-25T21:45:53,034][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2017-10-25T21:45:53,038][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[2017-10-25T21:46:13,152][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2017-10-25T21:46:13,158][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2017-10-25T21:46:13,479][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2017-10-25T21:46:13,637][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2017-10-25T21:46:13,660][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2017-10-25T21:46:13,675][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}



When I run logstash with -t I get a Configuration OK.

Investigating...

I'm seeing this error in verbose logging

[2017-10-26T00:06:46,267][DEBUG][logstash.agent           ] 2017-10-26 00:06:46 +0000: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>

Based on this comment (https://github.com/elastic/logstash/issues/6463#issuecomment-311576211) I'm wondering if there is some kind of syntax error in that Ruby code that's causing logstash to silently fail.

Aha! Silly, I should have checked the API closer. Set takes two arguments...

ruby {
        code => "
          if event.get('event').is_a? String
            raw_content = event.get('event')
            event.set('event_raw', raw_content)
          end
        "
    }

Interesting that logstash will fail silently for Ruby syntax errors.

Set takes two arguments...

Jeez, of course, sorry about that.

Not your fault. I should have had that right to start. Thanks for the help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.