Using Logstash 7.6.2 with the JDBC Input Plugin, and a MySQL database, I'm trying to import a DATETIME field, but apply a known timezone to that field on input, using the "date" filter. But "date" is always throwing an error when I try this.
Here's an extended example...
Assumptions:
- "job_timestamp" DATETIME field, with a value of "2020-03-09 01:42:59". It has no timezone offset, but we "know" it's in CDT (UTC -05:00). So the ISO8601 UTC datetime is "2020-03-09T06:42:59Z".
- The Logstash process' local timezone is PDT (UTC -07:00) - two hours behind CDT.
- We print out values with the "stdout" "ruby debug" output:
output {
stdout { codec => rubydebug }
}
Results:
(1) With no "date" plugin applied, Logstash interprets the DATETIMEs as local PDT times, and adds 7 hours onto it to get the UTC time 8:42:59:
{
"job_timestamp" => 2020-03-09T08:42:59.000Z
}
(2) Let's try to apply the "date" input filter, to convey that "job_timestamp" is in CDT:
date {
match => ["job_timestamp", "yyyy-MM-dd HH:mm:ss"]
timezone => "US/Central"
target => "new_timestamp"
}
But this returns a date parse failure, and does not create the expected new result field "new_timestamp":
{
"tags" => [
[0] "_dateparsefailure"
],
"job_timestamp" => 2020-03-09T08:42:59.000Z
}
(It leaves the original "job_timestamp" to be auto-converted, as before.)
(3) What if we "mutate" the DATETIME to string before we try to apply "date"?
mutate {
convert => { "job_timestamp" => "string" }
}
date {
match => ["job_timestamp", "yyyy-MM-dd HH:mm:ss"]
timezone => "US/Central"
target => "new_timestamp"
}
}
Nope. "job_timestamp" gets converted to a string, alright, but it has already assumed that it is a local PDT datetime, so "date" gets fed an incorrect ISO8601 date string:
{
"job_timestamp" => "2020-03-09T08:42:59.000Z",
"tags" => [
[0] "_dateparsefailure"
]
}
So my conclusion is that currently, there is no way for the user to apply a specified timezone conversion to incoming DATETIMEs from the JDBC Input Plugin. An early default assumption is made that any DATETIMEs without timezone offsets must be in the Logstash process' local timezone, and that assumption is made before any data is passed to "date", so "date" can't do much to fix it (without elaborate hacking, anyway). Can someone confirm my findings?
For now, I can do this work-around, but it's not pretty. Take the PDT ISO8601 string returned from "mutate"/"convert" above, and replace the "Z" UTC designator with "+02:00", to account for the PDT -> CDT timezone offset:
filter {
mutate {
convert => { "job_timestamp" => "string" }
gsub => ["job_timestamp", "Z", "+02:00"]
}
date {
match => ["job_timestamp", "ISO8601"]
target => "new_timestamp"
}
prune {
whitelist_names => [ "job_timestamp", "new_timestamp", "tags" ]
}
}
Results in:
{
"new_timestamp" => 2020-03-09T06:42:59.000Z,
"job_timestamp" => "2020-03-09T08:42:59.000+02:00"
}
Finally! "new_timestamp" is now correct. We mutated the original DATETIME into a string with the correct timezone (seen in "job_timestamp" above), converted that string back into a datetime with the "date" plugin, and assigned that into the new "new_timestamp" field. (If I wanted to set "job_timestamp" to that value instead, I'd just set "job_timestamp" as the "target" of "date".)
But yeah, just being able to directly set the timezone for incoming DATETIMEs would be much better, if you know their timezone, and you can't convince the data producer to add the timezone offset to the data itself (possibly for fear of messing up existing schemas).