Logstash - CSV to JSON


#1

Hopefully this is the right place, I am currently fairly new to the ELK stack so not sure if what I am trying to do in logstash is feasible.

I am consuming a CSV file and want to convert it into a JSON format as followed:

"properties":
{
"date": "2015-09-26T16:33:53",
"origin": "UK",
"status": "SUCCESS"
},
"geometry":
{
"type": "Point",
"coordinates":
[
latitude,
longitude
]
}

I was wondering if longstash had this capability where it can convert the file from a flat format such as a CSV into a GEOJSON format. My logstash config is below, I was hoping I can pass some sort of template to tell it convert the format into the above and then write this out to elastic search. Any advise recommendations would be appreciated.

Alternatively, I was thinking of creating a small app in Java that did the conversion but was hoping logstash had some sort of capability that did this.

Thanks
Regards
Sam

input {
lumberjack {
# The port to listen on
port => 5000

# The paths to your ssl cert and key
 ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
 ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
# Set this to whatever you want.
type => "my_data"

}
}

filter {
csv {
columns => [Timestamp,status,latitude,longitude,countryCode,countryName,regionName]
separator => ","
}
date{
match => ["Timestamp", "yyyy-MM-dd HH:mm:ss"]
}
}

output {
elasticsearch {
host => "localhost"
protocol => http
index => my_data
}
}


(Magnus Bäck) #2

Have a look at the mutate filter. You can mostly get away with rename operations.

mutate {
  rename => {
    "Timestamp" => "[properties][date]"
    "countryCode" => "[properties][origin]"
    "status" => "[properties][status]"
  }
}

Oh, and another thing:

columns => [Timestamp,status,latitude,longitude,countryCode,countryName,regionName]

This needs to be:

columns => ["Timestamp", "status", ...]

(It would've been convenient if the csv filter could've created the nested fields you want in the end but I'm not sure that's possible. You can try using the [field][subfield] notation and see what happens.)


#3

Thanks for responding, I could be doing something stupid but get the error below when I add the field type => Feature (snippet code below)

{:timestamp=>"2015-10-08T11:03:58.680000+0100", :message=>"Got error to send bulk of actions: [500] {"error":"IllegalArgumentException[Malformed action/metadata line [1], expected a simple value for field [_type] but found [START_ARRAY]]","status":500}", :level=>:error}

1.) Where you specified in the above response about columns where the content need to be surrounded by quotes, is there a reason for this? As it did work without?
2.) My other mutate where I am adding geometry appears to be incorrect as its failing the configtest, could be that I'm not understanding it properly

mutate {
add_field => [ "geometry" { "co-ordinates" [ "%{latitude} %{longitude}" ] } ]
add_field => {
"type" => "Feature"
}
rename => {
"Timestamp" => "[properties][date]"
}
}

Note: I've placed the mutate code after date in the original code so within the filter.


(Magnus Bäck) #4

When you use add_field for changing the type you actually turn type into an array with multiple values, which is what Elasticsearch is complaining about.

You can save yourself a lot of trouble by not sending to ES at this point. Use a stdout { codec => rubydebug } } output until you've verified that the messages look as expected.

  1. I'm surprised if that worked. I don't know why.
  2. Yeah, your add_field syntax for geometry is really weird.

Maybe this works (because, again, add_field for an existing field creates an array):

add_field => ["[geometry][coordinates]", "%{latitude}"]
add_field => ["[geometry][coordinates]", "%{longitude}"]

#5

Thanks, that worked :slight_smile: Trying to add a field following the mutate guide where it says newfield => "static value" as per below but trying to add this field type the configtest fails

add_field => { "type" => "Feature" }

Current config below works, adding the above fails:

mutate {
add_field => [ "geometry" { "co-ordinates" [ "%{latitude} %{longitude}" ] } ]
rename => {
"Timestamp" => "[properties][date]"
"countryCode" => "[properties][countryCode]"
"countryName" => "[properties][countryName]"
"regionName" => "[properties][regionName]"
"status" => "[properties][status]"
}
}


(Magnus Bäck) #6

add_field => [ "geometry" { "co-ordinates" [ "%{latitude} %{longitude}" ] } ]

Wait, didn't you say the last time that this didn't work (and indeed, I don't understand how it ever could)?


#7

No Magnus that code didn't work, it was me thinking I can add fields using json syntax.

What did work was the following where, the fields were not in quotes

columns => [Timestamp,status,latitude,longitude,countryCode,countryName,regionName]

Currently trying to get this to work: add_field => { "type" => "Feature" } but based on what you said above i'm guessing I send it as an array, like below (haven't tested it as of yet as i'm currently away from my computer)
add_field => [ ["type"] , "Feature"]


#8

Thanks Marcus, with your help I managed to sort out my config


(system) #9