Another possible approach is to simply turn your data into ndjson
(new line delimited json) using jq
found here then you would not need to worry about the multiline etc.
It is a super powerful json tool...
Say your file looks like this ( I put 2 in but it will work with 1 to n)
$ cat json-pretty-sample.json
[
{
"year": 2013,
"title": "Rush",
"info": {
"directors": [
"Ron Howard"
],
"release_date": "2013-09-02T00:00:00Z",
"rating": 8.3,
"genres": [
"Action",
"Biography",
"Drama",
"Sport"
],
"image_url": "http://ia.media-imdb.com/images/M/MV5BMTQyMDE0MTY0OV5BMl5BanBnXkFtZTcwMjI2OTI0OQ@@._V1_SX400_.jpg",
"plot": "A re-creation of the merciless 1970s rivalry between Formula One rivals James Hunt and Niki Lauda.",
"rank": 2,
"running_time_secs": 7380,
"actors": [
"Daniel Bruhl",
"Chris Hemsworth",
"Olivia Wilde"
]
}
},
{
"year": 2015,
"title": "Other Movie",
"info": {
"directors": [
"Ron Howard"
],
"release_date": "2013-09-02T00:00:00Z",
"rating": 8.3,
"genres": [
"Action",
"Biography",
"Drama",
"Sport"
],
"image_url": "http://ia.media-imdb.com/images/M/MV5BMTQyMDE0MTY0OV5BMl5BanBnXkFtZTcwMjI2OTI0OQ@@._V1_SX400_.jpg",
"plot": "A re-creation of the merciless 1970s rivalry between Formula One rivals James Hunt and Niki Lauda.",
"rank": 2,
"running_time_secs": 7380,
"actors": [
"Daniel Bruhl",
"Chris Hemsworth",
"Olivia Wilde"
]
}
}
]
I can simply run jq
and tell it to write ndjson
This command says... jq
output in compact form (ndjson) -c
and write all the elements within the top array .[]
$ cat json-pretty-sample.json | jq -c .[] > sample.ndjson
and now the file will be ndjson which is what logstash can easily read without the multiline code
$ cat sample.ndjson
{"year":2013,"title":"Rush","info":{"directors":["Ron Howard"],"release_date":"2013-09-02T00:00:00Z","rating":8.3,"genres":["Action","Biography","Drama","Sport"],"image_url":"http://ia.media-imdb.com/images/M/MV5BMTQyMDE0MTY0OV5BMl5BanBnXkFtZTcwMjI2OTI0OQ@@._V1_SX400_.jpg","plot":"A re-creation of the merciless 1970s rivalry between Formula One rivals James Hunt and Niki Lauda.","rank":2,"running_time_secs":7380,"actors":["Daniel Bruhl","Chris Hemsworth","Olivia Wilde"]}}
{"year":2015,"title":"Other Movie","info":{"directors":["Ron Howard"],"release_date":"2013-09-02T00:00:00Z","rating":8.3,"genres":["Action","Biography","Drama","Sport"],"image_url":"http://ia.media-imdb.com/images/M/MV5BMTQyMDE0MTY0OV5BMl5BanBnXkFtZTcwMjI2OTI0OQ@@._V1_SX400_.jpg","plot":"A re-creation of the merciless 1970s rivalry between Formula One rivals James Hunt and Niki Lauda.","rank":2,"running_time_secs":7380,"actors":["Daniel Bruhl","Chris Hemsworth","Olivia Wilde"]}}
This conf read the ndjson file and parses everything just fine.
Note I used the json
codec to read the file.. I am often confused between the two json
and json_lines
but for reading from a file with ndjson ... codec => "json"
gets the job done... you can also use the filter
with json
but this is pretty direct.
input {
file {
path => "/Users/sbrown/workspace/sample-data/discuss/logstash/sample.ndjson"
start_position => "beginning"
codec => "json"
sincedb_path => "/dev/null"
}
}
output {
stdout { codec => rubydebug }
}
Output
{
"year" => 2015,
"title" => "Other Movie",
"@version" => "1",
"info" => {
"genres" => [
[0] "Action",
[1] "Biography",
[2] "Drama",
[3] "Sport"
],
"rank" => 2,
"running_time_secs" => 7380,
"actors" => [
[0] "Daniel Bruhl",
[1] "Chris Hemsworth",
[2] "Olivia Wilde"
],
"release_date" => "2013-09-02T00:00:00Z",
"image_url" => "http://ia.media-imdb.com/images/M/MV5BMTQyMDE0MTY0OV5BMl5BanBnXkFtZTcwMjI2OTI0OQ@@._V1_SX400_.jpg",
"directors" => [
[0] "Ron Howard"
],
"rating" => 8.3,
"plot" => "A re-creation of the merciless 1970s rivalry between Formula One rivals James Hunt and Niki Lauda."
},
"@timestamp" => 2021-12-29T17:59:23.458Z,
"host" => "hyperion",
"path" => "/Users/sbrown/workspace/sample-data/discuss/logstash/sample.ndjson"
}
.......