Hello,
I'm new to Elastic and trying to parse JSON files in order to have multiple fields, so that I can make statistics out of it in Kibana.
Here is a sample:
{
"info": {
"generated_on": "2017-12-03 08:41:42.057563",
"slice": "0-999",
"version": "v1"
},
"playlists": [
{
"name": "Rock",
"collaborative": "false",
"pid": 0,
"modified_at": 1493424000,
"num_tracks": 22,
"num_albums": 27,
"num_followers": 1,
"tracks": [
{
"pos": 0,
"artist_name": "Michael Jackson",
"track_uri": "spotify:track:0UaMYEvWZi0ZqiDOoHU3YI",
"artist_uri": "spotify:d5F5d7go1WT98tk",
"track_name": "Song",
"album_uri": "spotify:album:6vV5Udzzf4Qo2I9K",
"duration_ms": 226863,
"album_name": "The Cookbook"
}],
"num_edits": 34,
"duration_ms": 9065801,
"num_artists": 37
},
{
"name": "Jazz",
"collaborative": "false",
"pid": 0,
"modified_at": 1493424000,
"num_tracks": 22,
"num_albums": 27,
"num_followers": 1,
"tracks": [
{
"pos": 0,
"artist_name": "Whatever",
"track_uri": "spotify:track:0UaMYEvWZi0ZqiDOoHU3YI",
"artist_uri": "spotify:d5F5d7go1WT98tk",
"track_name": "Song",
"album_uri": "spotify:album:6vV5Udzzf4Qo2I9K",
"duration_ms": 226863,
"album_name": "The Cookbook"
}],
"num_edits": 34,
"duration_ms": 9065801,
"num_artists": 37
}
]
}
I have managed to parse the above by using this logstash configuration:
input{
file{
path => "test.json"
sincedb_path => "/dev/null"
start_position => "beginning"
codec => multiline { pattern => "^Spalanzani" negate => true what => previous auto_flush_interval => 1 }
}
}
filter {
json {
source => "[message]"
}
}
output{
elasticsearch{
hosts => "localhost:9200"
index => "test"
}
stdout { codec => rubydebug }
}
The problem is:
This works for small files but as soon as I use the whole JSON files (~35MB, 60k lines)
I receive parsing errors and messages from hits in Kibana are just tracks/playlists that are randomly cut.
I'm 100% sure the JSON are written correctly and follow the above grammar.
Could it be the files are too big?
I use the latest versions of Kibana, Logstash and Elasticsearch.
Thank you for your help