Have Logstash take a file containing an array of JSON objects

Hi all,

I've been trying to use various input and filter plugins to take a JSON file containing an array of prettified json objects and spit out an event for each object. I've scoured SO and this forum and found a couple of things that have me almost there but am hoping you could all shed some light on my issue.

Example input:
[
{
"foo": "bar",
"baz": "buzz"
},
{
"hi": "there",
"how": "far"
}
]

Ideal Output:
{
"foo" => "bar"
"baz" => "buzz"
}
{
"hi" => "there"
"how" => "far"
}

Here is my current configuration:

input {
  file {
    codec => multiline {
      pattern => "^\]"
      negate => true
      what => previous
    }
    path => "path/to/json"
    start_position => "beginning"
    sincedb_path => "dev/null"
  }
}

filter {
  mutate  {
    gsub => [ "message","\]",""]
    gsub => [ "message","\[",""]
  }
  ruby {
    init => "require 'json'"
    code => "message = (message.to_json).gsub(/[ \n]/, '')"
  }
  json {
    source => "message"
  }
}

This will work when my json file contains an array of one object but I'd like it to work with multiple, ideally when they're in a comma-separated array. I've tried using the split operator using field => "message" and while I think Ruby could be helpful here, I don't know how to send one object at a time, squash it into one line, and then pass it through the json filter to turn it into its own event.

Any help would be greatly appreciated. Thanks in advance!

All right so I managed to figure it out with Ruby, but was hoping Logstash would do more of it out of the box. For anyone else who tries to do the same thing, here's my configuration:

input {
  file {
    codec => multiline {
      pattern => "\]"
      negate => true
      what => next
    }
    path => "path/to/json"
    start_position => "beginning"
    sincedb_path => "dev/null"
  }
}

filter {
  ruby {
    init => "require 'json'"
    code => "real_arr = JSON.parse(event.get('message'))
             event.set('split', real_arr)
    "
  }

  split {
    field => "split"
  }

  ruby {
    code => "split = event.get('split')
             split.each do |k, val|
              event.set(k, val)
             end
    "
  }
}

You could perhaps do something like this?

input {
  generator {
    lines => ['[
               {"a":1},
               {"a":2}
              ]']
    count => 1
  }
}

filter {
  json {
    source => "message"
    target => "data"
  }

  split {
    field => "data"
  }
}

output {
  stdout { codec => rubydebug}
}
1 Like

There is a little known side effect of setting an impossible delimiter in both the file input and the json_lines codec - it means the whole file is read as the payload to the json_lines codec. The json_line codec will create an event from each object in your JSON array. However, bear in mind the potential memory usage when the whole file is read into memory.

input {
  file {
    codec => json_lines {
      delimiter => "ΓΈΓΈΓΈΓΈΓΈΓΈ"
    }
    path => "path/to/json"
    start_position => "beginning"
    sincedb_path => "dev/null"
    delimiter => "ΓΈΓΈΓΈΓΈΓΈΓΈ"
  }
}

I have used this technique int this thread Denormalize data within a log file

Thank you for both for your replies! I actually encountered some issues as you mentioned when trying to load the whole file into memory, so I modified my conf accordingly. Leaving it here for future reference in case anyone else needs it!

input {
  file {
    codec => multiline {
      pattern => "^\s\s{"
      negate => true
      what => previous
      auto_flush_interval => 1
    }
    path => "path/to/json"
    start_position => "beginning"
    sincedb_path => "dev/null"
  }
}

filter {
  mutate {
    gsub => ["message", "\[", ""]
    gsub => ["message", "\]", ""]
    gsub => ["message", "^\s\s},", "}"]
  }

  # The first message from the input is "" so we shouldn't use the json filter on that
  if ["message"] != "" {
    json {
      source => "message"
    }
  }

  mutate {
    remove_field => ["message"]
  }
}
2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.