Trim log line before JSON codec in Input?

I was using the cloudwatch_logs plugin to read JSON-based logs from an AWS cloudwatch logstream. Those log lines were read ok, with the keys in the JSON arriving as properties on my event object and accessible in the filter section. (The log is from an Open edX installation on my server, not generated by CloudWatch.)

However, I've had to switch to using S3 to read these logs. When you save CloudWatch logs to S3, they get stored as a string with the timestamp and then the actual JSON string. Like

2018-01-01T12:21:11.316Z {"username": "blah", .... }

Is there a way to trim off the timestamp in my input section so I can still use a JSON codec? (Don't need that initial timestamp, there's one in the actual JSON log.)

Or do I have to transform the message into JSON in the filter section (using dissect?)? If so, I'm not sure how to do that so the data in the log JSON is set back to the top event's properties so I can still do something like


I thought I could do something like:

ruby {
        code => "event['message'] = event['message'][25..-1]"

    json {
        source => "message"
        target => "message"

but that doesn't work. (If it did, I'm assuming Logstash would have some internal way of automatically mapping the keys in message to the fields that the event object exposes, such that I could do the [username] conditional...but maybe I have that wrong too.)

Thanks for any suggestions.

Hi @danielmcquillen,

Something like this should work

  if [my_key] == "something" {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{GREEDYDATA:json_data}" }
      json {
        source => "json_data"

You can't really do if [username]=="blah" that way before you have parsed the JSON.

Awesome. Thanks A_B!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.