Have Logstash take a file containing an array of JSON objects

Jedeu · September 27, 2018, 5:52pm

Hi all,

I've been trying to use various input and filter plugins to take a JSON file containing an array of prettified json objects and spit out an event for each object. I've scoured SO and this forum and found a couple of things that have me almost there but am hoping you could all shed some light on my issue.

Example input:
[
{
"foo": "bar",
"baz": "buzz"
},
{
"hi": "there",
"how": "far"
}
]

Ideal Output:
{
"foo" => "bar"
"baz" => "buzz"
}
{
"hi" => "there"
"how" => "far"
}

Here is my current configuration:

input {
  file {
    codec => multiline {
      pattern => "^\]"
      negate => true
      what => previous
    }
    path => "path/to/json"
    start_position => "beginning"
    sincedb_path => "dev/null"
  }
}

filter {
  mutate  {
    gsub => [ "message","\]",""]
    gsub => [ "message","\[",""]
  }
  ruby {
    init => "require 'json'"
    code => "message = (message.to_json).gsub(/[ \n]/, '')"
  }
  json {
    source => "message"
  }
}

This will work when my json file contains an array of one object but I'd like it to work with multiple, ideally when they're in a comma-separated array. I've tried using the split operator using field => "message" and while I think Ruby could be helpful here, I don't know how to send one object at a time, squash it into one line, and then pass it through the json filter to turn it into its own event.

Any help would be greatly appreciated. Thanks in advance!

Jedeu · September 27, 2018, 9:14pm

All right so I managed to figure it out with Ruby, but was hoping Logstash would do more of it out of the box. For anyone else who tries to do the same thing, here's my configuration:

input {
  file {
    codec => multiline {
      pattern => "\]"
      negate => true
      what => next
    }
    path => "path/to/json"
    start_position => "beginning"
    sincedb_path => "dev/null"
  }
}

filter {
  ruby {
    init => "require 'json'"
    code => "real_arr = JSON.parse(event.get('message'))
             event.set('split', real_arr)
    "
  }

  split {
    field => "split"
  }

  ruby {
    code => "split = event.get('split')
             split.each do |k, val|
              event.set(k, val)
             end
    "
  }
}

Christian_Dahlqvist · September 28, 2018, 7:41am

You could perhaps do something like this?

input {
  generator {
    lines => ['[
               {"a":1},
               {"a":2}
              ]']
    count => 1
  }
}

filter {
  json {
    source => "message"
    target => "data"
  }

  split {
    field => "data"
  }
}

output {
  stdout { codec => rubydebug}
}

guyboertje · September 28, 2018, 8:27pm

There is a little known side effect of setting an impossible delimiter in both the file input and the json_lines codec - it means the whole file is read as the payload to the json_lines codec. The json_line codec will create an event from each object in your JSON array. However, bear in mind the potential memory usage when the whole file is read into memory.

input {
  file {
    codec => json_lines {
      delimiter => "øøøøøø"
    }
    path => "path/to/json"
    start_position => "beginning"
    sincedb_path => "dev/null"
    delimiter => "øøøøøø"
  }
}

I have used this technique int this thread Denormalize data within a log file

Jedeu · September 28, 2018, 9:51pm

Thank you for both for your replies! I actually encountered some issues as you mentioned when trying to load the whole file into memory, so I modified my conf accordingly. Leaving it here for future reference in case anyone else needs it!

input {
  file {
    codec => multiline {
      pattern => "^\s\s{"
      negate => true
      what => previous
      auto_flush_interval => 1
    }
    path => "path/to/json"
    start_position => "beginning"
    sincedb_path => "dev/null"
  }
}

filter {
  mutate {
    gsub => ["message", "\[", ""]
    gsub => ["message", "\]", ""]
    gsub => ["message", "^\s\s},", "}"]
  }

  # The first message from the input is "" so we shouldn't use the json filter on that
  if ["message"] != "" {
    json {
      source => "message"
    }
  }

  mutate {
    remove_field => ["message"]
  }
}

system · October 26, 2018, 9:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash - Json file input question Logstash	3	616	June 11, 2020
Follow-up from "Unable to avoid JSON parsing errors in logstash log-file" Logstash	1	301	July 20, 2021
Parse array of json objects in json Logstash	5	2524	December 4, 2017
Need help parsing json array Logstash	4	324	June 26, 2018
Parsing nested json object and make as a single filed Logstash	4	332	August 10, 2018

Have Logstash take a file containing an array of JSON objects

Related topics