S3 output creating invalid json from CSV


#1

Hi, using s3 output to read from a csv file and store it as json.
My logstash conf looks like below. Problem: The output is not a valid json array consisting of json objects but a file having a list of json objects in it.
Any idea if that can be changed?

Instead of a file looking like this
{jsonobject_a} {jsonobject_b} {jsonobject_c}
I want a file like
[{jsonobject_a},{jsonobject_b},{jsonobject_c}]

my logstash.conf
input {
tcp {
port => 4560
codec => json_lines
add_field => {
"logstash_input" => "tcp_4560"
}
}
beats {
port => 5044
add_field => {
"logstash_input" => "beats"
}
}
}

filter {
...
} else {
csv {
columns => [
"date",
"level",
"server",
"log_message"
]
separator => "|"
}
}
}

output {
s3{
region => "us-east-1"
bucket => "....mybucket-logdata"
codec => "json"
}
}


(Magnus Bäck) #2

The output is not a valid json array consisting of json objects but a file having a list of json objects in it.
Any idea if that can be changed?

To avoid misunderstandings please show examples intead of describing the situations. What do you get now? What would you like to get instead?


#3

Hi Magnus,
logstash turns each line into a json object and puts all of those objects in the file it creates in s3. hwoever, these objects are not embedded in a json array. they are simle added one after the otehr to the file.

instead of that output in the s3 file (what is simply some list of json objects)
{
"date": "2018-04-18T09:26:35.0039150-05:00",
"server": "abcd",
"offset": 74,
"level": "error",
"prospector": {
"type": "log"
},
"source": "/var/log/x_debug/x_debugyy.log",
"message": "2018-04-18T09:26:35.0039150-05:00|error|yyy|Checking Product..",
"logstash_input": "beats",
"tags": ["beats_input_codec_plain_applied"],
"@timestamp": "2018-05-07T12:52:15.481Z",
"@version": "1",
"beat": {
"name": "ip-xxxx",
"hostname": "ip-xxxx",
"version": "6.1.2"
},
"host": "ip-xxxx",
"log_message": "Checking Product..",
"fields": {
"source_system_id": "x_debug"
}
} {
"date": "2018-04-18T09:42:48.0478973-05:00",
"server": "yyy",
"offset": 154,
"level": "Information",
"prospector": {
"type": "log"
},
"source": "/var/log/x_debug/x_debugyy.log",
"message": "2018-04-18T09:42:48.0478973-05:00|Information|yyy|Checking Product..",
"logstash_input": "beats",
"tags": ["beats_input_codec_plain_applied"],
"@timestamp": "2018-05-07T12:52:15.481Z",
"@version": "1",
"beat": {
"name": "ip-xxx",
"hostname": "ip-xxx",
"version": "6.1.2"
},
"host": "ip-xxx",
"log_message": "Checking Product..",
"fields": {
"source_system_id": "x_debug"
}
}

I want below structure to be the output of the file generated by logstash in s3
[{
"date": "2018-04-18T09:26:35.0039150-05:00",
"server": "abcd",
"offset": 74,
"level": "error",
"prospector": {
"type": "log"
},
"source": "/var/log/x_debug/x_debugyy.log",
"message": "2018-04-18T09:26:35.0039150-05:00|error|yyy|Checking Product..",
"logstash_input": "beats",
"tags": ["beats_input_codec_plain_applied"],
"@timestamp": "2018-05-07T12:52:15.481Z",
"@version": "1",
"beat": {
"name": "ip-xxxx",
"hostname": "ip-xxxx",
"version": "6.1.2"
},
"host": "ip-xxxx",
"log_message": "Checking Product..",
"fields": {
"source_system_id": "x_debug"
}
} , {
"date": "2018-04-18T09:42:48.0478973-05:00",
"server": "yyy",
"offset": 154,
"level": "Information",
"prospector": {
"type": "log"
},
"source": "/var/log/x_debug/x_debugyy.log",
"message": "2018-04-18T09:42:48.0478973-05:00|Information|yyy|Checking Product..",
"logstash_input": "beats",
"tags": ["beats_input_codec_plain_applied"],
"@timestamp": "2018-05-07T12:52:15.481Z",
"@version": "1",
"beat": {
"name": "ip-xxx",
"hostname": "ip-xxx",
"version": "6.1.2"
},
"host": "ip-xxx",
"log_message": "Checking Product..",
"fields": {
"source_system_id": "x_debug"
}
}]


(Magnus Bäck) #4

Surely each JSON object is on a line of its own rather than pretty-printed like this? Use the json_lines codec to make sure there's a linebreak between each event.

There are two very good reasons why Logstash works like this:

  • Logstash doesn't know when the file is "done", so it doesn't know when to write the final ].
  • Reading the kind of logfile you're asking for would be very costly if the file is big.

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.