Is it possible to specify data type with the columns in CSV filter?

This works, of course

csv { 
    columns => ["mmsi", "status", "datetime", "lat", "long" ]
    separator  => ","
}

But it does not seem possible to add data types. This does not work

csv { 
    columns => [
        "mmsi"     { "type" => "string" },
        "status"   { "type" => "string" },
        "datetime" { "type" => "date"   "format" => "YYYY-MM-dd" },
        "lat"      { "type" => "number" },
        "long"     { "type" => "number" }
    ]
    separator  => ","
}

What is the correct way to specify data types for given columns?

2 Likes

You need to use a mutate + convert to do it.

Using this in the filter clause.

mutate {
convert => {
"datetime" => "date"
}
}

passes --configtest, but on startup, this error gets thrown:

Invalid conversion type 'date', expected one of 'string,integer,float,boolean'

"datetime" is the field name, and "date" is the data type I would like to convert it to. But "date" (datatype) is not available.

Can you suggest?

Thanks.

-- Chris

Likewise, using the formats list

mutate {
    convert => {
        "datetime" => "date_hour_minute_second"
    }
}

throws this error:

Error: Cannot register filter mutate plugin. The error reported is:
Invalid conversion type 'date_hour_minute_second', expected one of 'string,integer,float,boolean'

I'm referencing this ES doc

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

to be complete, here is my whole filter plug-in

filter {
csv {
columns => [ "mmsi", "datetime", "status", "lat", "long" ]
separator => ","
}
mutate {
add_field => {
"[location][lat]" => "%{Latitude}"
"[location][lon]" => "%{Longitude}"
}
}
mutate {
convert => {
"[location][lat]" => "float"
"[location][lon]" => "float"
}
}
mutate {
gsub => [
# Strip off the milliseconds
"datetime", "(..*)", ""
]
}
mutate {
convert => {
"datetime" => "date_hour_minute_second"
}
}
}

If you read the docs - mutate | Logstash Reference [2.1] | Elastic

Valid conversion targets are: integer, float, string, and boolean.

If you want to match a timestamp, use the date filter.