Logstash logs :sql_last_value in local timezone, making fresh db records lookup unfeasible

Hello!

My goal is to set up continuous data migration from MySQL to Elasticsearch.

I've got stuck with :sql_last_value.
Logstash logs it in local timezone, so next time it queries db, :sql_last_value value is wrong.

My local timezone is (CEST, +02:00).

So if the last received datetime field from db was like 2018-07-20T00:57:34.000Z, next query to db will be for :sql_last_value = 2018-07-20 02:57:34 and it won't get any of recently updated records.

How can this be solved?

Thanks.

My logstash.conf:

input {
  jdbc { 
    jdbc_driver_library => "/local/path/mysql-connector-java-8.0.11.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/db"
    jdbc_user => "root"
    jdbc_password => ""
    schedule => "* * * * *"
    statement => "SELECT * FROM table WHERE updatedAt > :sql_last_value ORDER BY updatedAt"
    use_column_value => true
    tracking_column => "updatedat"
    tracking_column_type => "timestamp"
    }
  }
output {
  stdout { codec => json_lines }
  elasticsearch {
  "hosts" => "localhost:9200"
  "index" => "test-migrate"
  "document_type" => "data"
  "document_id" => "%{id}"
  }
}
1 Like

sql_last_value is always shifted to local timezone in .logstash_jdbc_last_run, so in next db query :sql_last_value doesn't match actual db datetime records.

The only way I found to make scheduled incremental updates working is to set my local timezone to +00:00.

But it's not an appropriate solution though.

Does the jdbc_default_timezone option help you?

Nope, it converts fetched dates into given timezone.

But I need to have the real last fetched timestamp value in my query to db in order to make incremental updates working.

Wonder how is this supposed to work, if it changes sql_last_value to local timezone between queries!

But since the configuration file is written in Ruby, maybe I can manually recover the sql_last_value value before querying db?

Posted this issue on logstash-input-jdbc github, still no answer.

Could anyone please help me with this?

Maybe there is a possibility to correct sql_last_value by manually shifting it back for given number of hours in logstash.conf before making statement request?

Afraid to seem boring, but this bothers me a lot.

It's obvious that described in docs Logstash jdbc plugin behaviour is implemented improperly.

Just if I could do something like

ruby {
  code => "sql_last_value = sql_last_value - 2.hours}"
}

inside of input plugin,
or something like

parameters => {"corr_sql_last_value" => sql_last_value - 2.hours}

But the only solution I've currently succeeded to find is to set local timezone to UTC.

This setting jdbc_default_timezone works like this:

  • I have a database and all its timestamps are in local time
  • Logstash/Elasticsearch/Kibana timestamps are always UTC
  • my local time is +1:00 off of UTC, meaning a db time of 2pm is 1pm UTC
  • I set this setting to an appropriate timezone string, e.g. "Europe/Paris"
  • the plugin will convert any date/time/timestamp values into UTC when it is receiving data
  • the plugin will convert any date/time/timestamp values int local time when it is sending data.
  • the plugin stores the sql_last_value as UTC
  • my Logstash event timestamps are in UTC
  • at the next run, when the sql_last_value is used in the SQL statement, the plugin converts the sql_last_value timestamp into "Europe/Paris" local time before it inserts it in the statement.
  • my sql_last_value has 1pm but on my db server the logged statement shows 2pm.
1 Like

Thanks for detailed explanation!

I have tried setting the jdbc_default_timezone.
My local timezone is +0200:

~$ timedatectl status | grep "Time zone"
       Time zone: Europe/Belgrade (CEST, +0200)

So I've set:

jdbc_default_timezone => "Etc/GMT-2"

My last received date/time value is 2018-07-20T00:57:34.000Z.
It is its exact value in db, and it is what i've got in Elasticsearch index.

mysql> select `updatedat` from `table` where `id` = 2;
+---------------------+
| updatedat           |
+---------------------+
| 2018-07-20 00:57:34 |
+---------------------+
1 row in set (0,01 sec)

So far so good.

But on next db querying sql_last_value was 2018-07-20 02:57:34 again. Two hours shifted.

And it's stored with the same value:

/usr/share/logstash$ cat .logstash_jdbc_last_run 
--- !ruby/object:DateTime '2018-07-20 02:57:34.000000000 +02:00'

I've even tried:

jdbc_default_timezone => "Etc/GMT-10"

And got sql_last_value as 2018-07-20 02:57:34 in SQL statement again...

And in .logstash_jdbc_last_run:

/usr/share/logstash$ cat .logstash_jdbc_last_run 
--- !ruby/object:DateTime '2018-07-20 02:57:34.000000000 +10:00'

It's obviously the sql_last_value value in actual db query doesn't depend on jdbc_default_timezone setting.

So I've set my timezone to Asia/Shanghai (CST, +0800).
And got sql_last_value = 2018-07-20 08:57:34. Eight hours shifted.

Isn't it weird?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.