Best way to parse json input?

I have json in a file in the format:

{
   events : [ {
       prop1 : val1,
       prop2 : val2
   }, {
      prop1 : val3,
      prop2 : val4
   } ]
}

I've tried multiline input with no success. I tried a mutate filter to remove the \n again with no success. It seems there are multiple ways of doing it but I can't get any of them to work.

Any help appreciated.

1 Like

You should be able to use json_lines here.
Can you provide your configs of what you tried?

I also expected json_lines to work, but not even this trivial example does:

$ cat test.config 
input { stdin { codec => json_lines } }
output { stdout { codec => rubydebug } }
$ cat data 
{
   "foo": "bar"
}
$ /opt/logstash/bin/logstash -f test.config < data
Logstash startup completed
{
       "message" => "{",
          "tags" => [
        [0] "_jsonparsefailure"
    ],
      "@version" => "1",
    "@timestamp" => "2015-09-23T05:47:37.345Z",
          "host" => "lnxmagnusbk"
}
A plugin had an unrecoverable error. Will restart this plugin.
  Plugin: <LogStash::Inputs::Stdin codec=><LogStash::Codecs::JSONLines charset=>"UTF-8">, debug=>false>
  Error: string not matched {:level=>:error}
Logstash shutdown completed

Reading the documentation I find it quite ambiguous, and comparing the source code with that of the json codec doesn't really help either. It seems the difference between json and json_lines is that the latter will wait for a newline character before attempting to unmarshal the data while the former just grabs and unmarshals whatever it can get its hands on.

Thanks for replying. My config is:

input{
  file{
    codec => json_lines
    sincedb_path => "dev/null"
    path => "file.json"
    start_position => "beginning"
  }
}
output{
  stdout{codec => ruby_debug}
}

But when I run "logstash --verbose -f file.conf" all I get is:

Registering file input {path=>["file.json"], :level=>:info}
Pipeline started {:level=>:info}
Logstash startup completed

Nothing else, file not processed.
I switched json_lines to json in my config and I get:

A plugin had an unrecoverable error. Will restart this plugin.
Plugin: &lt;LogStash::Inputs::File path=...
Error: string not matched {level=>:error}

I'm using Logstash-1.5.4.

This post seems to indicate theat json_lines does not work with file input: https://github.com/logstash-plugins/logstash-codec-json_lines/issues/7

Any ideas?

Based on https://gist.github.com/shurane/60eee09eeee15a50f289, I changed my config to:

input{
  exec{
    command => "cat file.json"
    codec => json_lines
    interval => 60
  }
}
output{
  stdout{codec => ruby_debug}
}

...with this result:

A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Exec command=>...
Error: string not matched {:level=>:error}
{
  "message" => "}{\r",
  "tags" => ...
}

Am I correct in thinking the json_lines codec looks for \n and is failing when it comes to a \r?

I actually don't think the json_lines codec supports the multiline JSON documents that you have. I suspect the codec is for single-line JSON documents delimited by newline characters.

Yeah I think you're right. I think I can get my data as one line (not pretty-printed) though so where does that leave me? What's the best input to use with single line json? Eventually I need to fetch the json through a REST service using something like http_poller but it doesn't seem to work for https (Does http_poller handle https?). In the meantime I have some of the json to test with, I'm just trying to get it into elasticsearch somehow. So far the file input with json_lines codec is out. I guess I'll try flattening the json to one line and try some other configs. Any tips appreciated!

I've made some progress. I reverted my file.json to its original, multiline format and changed my config to:

input{
  exec{
    command => "cat file.json"
    codec => json
    interval => 60
  }
}
output{
  stdout{codec => ruby_debug}
}

...and it processed the whole file. When I indexed it into elasticsearch though I see it created only one event. Not ideal because the document is a json object with an array of 1000 events. I'll see now if I can make it appear so in Kibana.

Found no way to view a single event containing multiple events in Kibana that made sense. I changed my data format from:

{
   events : [ {
       prop1 : val1,
       prop2 : val2
   }, {
      prop1 : val3,
      prop2 : val4
   } ]
}

... to a simple array rather than an object:

[ {
  prop1 : val1,
  prop2 : val2
}, {
  prop1 : val3,
  prop2 : val4
}]

... and the indexing worked properly, 1000 events!

I am having problems getting logstash to read a file containing a json array. What did your input configuration eventually look like to read in the array?

Thanks,

Nathan

Please start your own thread for this.

Hello,
Is there any chance to make it work as logstash service (background mode)? Because for me it only works in cmd (console). But after server restart it will not start automatically in console but only in services. But it can not parse json file in service mode. I am taking about windows now. Probably in Linux it can work in both cases.