JDBC Input Plugin: "date" input plugin can't apply known timezone to incoming DATETIMEs

Using Logstash 7.6.2 with the JDBC Input Plugin, and a MySQL database, I'm trying to import a DATETIME field, but apply a known timezone to that field on input, using the "date" filter. But "date" is always throwing an error when I try this.

Here's an extended example...

Assumptions:

  • "job_timestamp" DATETIME field, with a value of "2020-03-09 01:42:59". It has no timezone offset, but we "know" it's in CDT (UTC -05:00). So the ISO8601 UTC datetime is "2020-03-09T06:42:59Z".
  • The Logstash process' local timezone is PDT (UTC -07:00) - two hours behind CDT.
  • We print out values with the "stdout" "ruby debug" output:
    output {
      stdout { codec => rubydebug }
    }

Results:
(1) With no "date" plugin applied, Logstash interprets the DATETIMEs as local PDT times, and adds 7 hours onto it to get the UTC time 8:42:59:

    {
        "job_timestamp" => 2020-03-09T08:42:59.000Z
    }

(2) Let's try to apply the "date" input filter, to convey that "job_timestamp" is in CDT:

    date {
        match => ["job_timestamp", "yyyy-MM-dd HH:mm:ss"]
        timezone => "US/Central"
        target => "new_timestamp"
    }

But this returns a date parse failure, and does not create the expected new result field "new_timestamp":

    {
                 "tags" => [
            [0] "_dateparsefailure"
        ],
        "job_timestamp" => 2020-03-09T08:42:59.000Z
    }

(It leaves the original "job_timestamp" to be auto-converted, as before.)

(3) What if we "mutate" the DATETIME to string before we try to apply "date"?

      mutate {
        convert => { "job_timestamp" => "string" }
      }

      date {
        match => ["job_timestamp", "yyyy-MM-dd HH:mm:ss"]
        timezone => "US/Central"
        target => "new_timestamp"
      }
    }

Nope. "job_timestamp" gets converted to a string, alright, but it has already assumed that it is a local PDT datetime, so "date" gets fed an incorrect ISO8601 date string:

    {
        "job_timestamp" => "2020-03-09T08:42:59.000Z",
                 "tags" => [
            [0] "_dateparsefailure"
        ]
    }

So my conclusion is that currently, there is no way for the user to apply a specified timezone conversion to incoming DATETIMEs from the JDBC Input Plugin. An early default assumption is made that any DATETIMEs without timezone offsets must be in the Logstash process' local timezone, and that assumption is made before any data is passed to "date", so "date" can't do much to fix it (without elaborate hacking, anyway). Can someone confirm my findings?

For now, I can do this work-around, but it's not pretty. Take the PDT ISO8601 string returned from "mutate"/"convert" above, and replace the "Z" UTC designator with "+02:00", to account for the PDT -> CDT timezone offset:

    filter {
      mutate {
        convert => { "job_timestamp" => "string" }
        gsub => ["job_timestamp", "Z", "+02:00"]
      }

      date {
        match => ["job_timestamp", "ISO8601"]
        target => "new_timestamp"
      }

      prune {
        whitelist_names => [ "job_timestamp", "new_timestamp", "tags" ]
      }
    }

Results in:

    {
        "new_timestamp" => 2020-03-09T06:42:59.000Z,
        "job_timestamp" => "2020-03-09T08:42:59.000+02:00"
    }

Finally! :stuck_out_tongue: "new_timestamp" is now correct. We mutated the original DATETIME into a string with the correct timezone (seen in "job_timestamp" above), converted that string back into a datetime with the "date" plugin, and assigned that into the new "new_timestamp" field. (If I wanted to set "job_timestamp" to that value instead, I'd just set "job_timestamp" as the "target" of "date".)

But yeah, just being able to directly set the timezone for incoming DATETIMEs would be much better, if you know their timezone, and you can't convince the data producer to add the timezone offset to the data itself (possibly for fear of messing up existing schemas).

Yes, the jdbc filter is converting the date/time column in the database to a LogStash::Timestamp. A date filter cannot parse that. So you need to mutate and gsub it.

2 Likes

Hello @Jonathan_Rice

This is a great post and wanted to offer some clarifications.

The Logstash date Filter expects a string. The _dateparsefailure tag is expected when not the date field is not actually converted to string.

The Logstash JDBC Input makes use of Sequel library (https://github.com/jeremyevans/sequel).
We have those layers of transformation:

  • The TZ configured on MySQL server side
  • The JDBC Driver optional parameters related to TZ (e.g. useTimezone, serverTimezone...)
  • The Sequel Library abstraction
  • Logstash JDBC Input

The JDBC Input plugin (documentation) offers 2 parameters related to timezone:

  • jdbc_default_timezone
    SQL does not allow for timezone data in timestamp fields. This plugin will automatically convert your SQL timestamp fields to Logstash timestamps, in relative UTC time in ISO8601 format.
    Using this setting will manually assign a specified timezone offset, instead of using the timezone setting of the local machine. You must use a canonical timezone, America/Denver, for example. See documentation and source code
  • plugin_timezone
    Give users the ability to force Sequel application side into using local timezone. See documentation

I would suggest to attempt using those options.

Another post which doesn't make use of those params: https://github.com/logstash-plugins/logstash-input-jdbc/issues/293#issuecomment-414546156

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.