Splitting json file

Hi

I am totally new in logstash, and i am having some issues when splitting a json file.

My file is super easy

{
    "data": [
        {
            "name": "name1"
        },
        {
            "name": "name2"
        }
    ]
}

my filter is even easier

filter {
split { field => "data" }
  
}

However i am having this error message:

Only String and Array types are splittable. field:data is of type = NilClass

Any idea what am i doing wrong?

Thanks

Hi,

It looks like the data field is not instantiate.
You should use this filter:

filter {
  json {
    source => "message"
  }
  split {
    field => "data" 
  }  
}

Cad.

Thanks

I changed that, and it is not doing anything at all now

input {
  file{
    path => "/Users/javi/Downloads/logstash-8.1.3/mytest.json"
    start_position => "beginning"
    sincedb_path => ".sincedb"
  }
}

filter {
  json {
    source => "message"
  }
  split {
    field => "data" 
  }  
}

output {
 file {
   path => "/Users/javi/Downloads/logstash-8.1.3/testmutated3.json"
 }
}

This is the last line i have in the console:

[2022-04-28T12:01:48,716][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}

This is just an information message.

Can you edit your output part like this

output {
  file {
     path => "/Users/javi/Downloads/logstash-8.1.3/testmutated3.json"
  }
  stdout { codec => json }
}

And share us what is printed.
Cad

Hi

Yes, i know this is the info message, what i mean is that logstash doesnt do anything, it hangs at that point.

I have noticed that this happens when i put the json text in one line. If i format the json file as i pasted in the original post, then, it is when i have the errors. THis is what i have with your output:


[2022-04-28T14:39:12,215][WARN ][logstash.filters.json    ][main][3fb410f4c4226c62d48f6393d19008a691ac547d52f3b83f0d9a28ff8bd10c39] Error parsing json {:source=>"message", :raw=>"    \"data\": [", :exception=>#<LogStash::Json::ParserError: Unexpected character (':' (code 58)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: (byte[])"    "data": ["; line: 1, column: 12]>}

[2022-04-28T14:39:12,219][WARN ][logstash.filters.split   ][main][3b78a77ee766469bb69c5822553d6a87e8c3bb48248d9f2a5fa111408c3d61eb] Only String and Array types are splittable. field:data is of type = NilClass

So, my gut feeling is that logstash doesnt like formatted jsons, and this is why i am having the error. But... why is it not doing anything with the one liner?

Have any of you tested this and having the same problem?

This is interesting... i changed the output to rubydebug and i have this

Looks like it is parsing (or at least trying) the message, but with a lot of errors. Also it is escaping my json fields ( \ is added to the double quote of the name field)


{
          "tags" => [
        [0] "_jsonparsefailure",
        [1] "_split_type_failure"
    ],
          "host" => {
        "name" => "javi-mac"
    },
    "@timestamp" => 2022-04-28T14:06:06.822039Z,
           "log" => {
        "file" => {
            "path" => "/Users/javi/Downloads/logstash-8.1.3/mytestjavi3.json"
        }
    },
       "message" => "{",
      "@version" => "1"
}
{
          "tags" => [
        [0] "_split_type_failure"
    ],
          "name" => "name1",
          "host" => {
        "name" => "javi-mac"
    },
    "@timestamp" => 2022-04-28T14:06:06.854082Z,
           "log" => {
        "file" => {
            "path" => "/Users/javi/Downloads/logstash-8.1.3/mytestjavi3.json"
        }
    },
      "@version" => "1",
         "event" => {
        "original" => "        {\"name\": \"name1\"},"
    }
}
{
          "tags" => [
        [0] "_jsonparsefailure",
        [1] "_split_type_failure"
    ],
          "host" => {
        "name" => "javi-mac"
    },
    "@timestamp" => 2022-04-28T14:06:06.860157Z,
           "log" => {
        "file" => {
            "path" => "/Users/javi/Downloads/logstash-8.1.3/mytestjavi3.json"
        }
    },
       "message" => "    ]",
      "@version" => "1"
}
{
          "tags" => [
        [0] "_jsonparsefailure",
        [1] "_split_type_failure"
    ],
          "host" => {
        "name" => "javi-mac"
    },
    "@timestamp" => 2022-04-28T14:06:06.845916Z,
           "log" => {
        "file" => {
            "path" => "/Users/javi/Downloads/logstash-8.1.3/mytestjavi3.json"
        }
    },
       "message" => "    \"data\": [",
      "@version" => "1"
}
{
          "tags" => [
        [0] "_split_type_failure"
    ],
          "name" => "name2",
          "host" => {
        "name" => "javi-mac"
    },
    "@timestamp" => 2022-04-28T14:06:06.857174Z,
           "log" => {
        "file" => {
            "path" => "/Users/javi/Downloads/logstash-8.1.3/mytestjavi3.json"
        }
    },
      "@version" => "1",
         "event" => {
        "original" => "        {\"name\": \"name2\"},"
    }
}


Now it is obvious, it is because you are using the file input plugin.
I you read the documentation of the file input plugin, it is print

By default, each event is assumed to be one line and a line is taken to be the text before a newline character. If you would like to join multiple log lines into one event, you’ll want to use the multiline codec.

So basically the json filter can't process correctly because the file input plugin split the input json when he is well formated in the source file.
So, has it is recommended by the documentation, you have to use the multiline codec to get the entire file content. So you config file need to look like this :

input {
  file{
    path => "/Users/javi/Downloads/logstash-8.1.3/mytest.json"
    start_position => "beginning"
    sincedb_path => ".sincedb"
    codec => multiline { # each line not starting with "azertyqwerty" is merged with the previous one.
      pattern => "^azertyqwerty"
      negate => "true"
      what => "previous"
    }
  }
}

filter {
  json {
    source => "message"
  }
  split {
    field => "data" 
  }  
}

output {
 file {
   path => "/Users/javi/Downloads/logstash-8.1.3/testmutated3.json"
 }
}

it contains \ to show that the message field contains a string and this string also contains " special character

Cad.

Thanks! this starts to make sense

If i change this to a single liner


{ "data": [ { "name": "name1"},{"name": "name2"} ]}

it should work without the multiliner codec, isnt it? why is it not working ?

Yes

Can you share us the result you get ?

It doesnt process the file. I have noticed that sometimes if i add an extra blank line at the end or the beginning of the file, it processes it. Its quite weird

You should read this Logstash won't read 1 line input file - Elastic Stack / Logstash - Discuss the Elastic Stack

Cad.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.