Time difference in last_run_metadata_path file when Logstash and source database are in different timezones

Hello,

Our ELK Stack is placed in UTC timezone (OS timezone is set to UTC). The database from which we pull the data from using JDBC is in CET (UTC+2) timezone. The config file for the pipeline looks like this:

  jdbc {
    jdbc_driver_library => "/drivers/sqljdbc42.jar"
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_connection_string => "jdbc:sqlserver://1.1.1.1;databaseName=test"
    jdbc_user => "test"
    jdbc_password => "test"
    schedule => "*/5 * * * *"
	statement => "select * from users where date > :sql_last_value order by date asc"
	use_column_value => true
	tracking_column_type => "timestamp"
    tracking_column => "date"
	last_run_metadata_path => "/last_run_metadata/.test"
	record_last_run => true
  }

Above pipeline is scheduled to run every 5 minutes.
Here's is the current data in users table:

user	date
John	2020-07-07 13:00:00
Ann		2020-07-07 14:00:00
Frank	2020-07-07 15:00:00

Let's assume that it's this pipeline's first run, so the query executed by Logstash will look more or less like this:

select * from db where date > '1970-01-01 00:00:00' order by date asc

With above settings, Logstash will pull all 3 rows and create 3 json documents. Immediately after that, it will create /last_run_metadata/.test file with the last date it will pull from users table. It should be 2020-07-07 15:00:00 as this is the last date from the dataset, but due to timezone difference, Logstash will add two hour difference and the actual content of /last_run_metadata/.test will look like this:

--- 2020-07-07 17:00:00.000000000 Z

After our pipeline finished working, and before the next run, there are two rows added to users table so it looks like this:

user	date
John	2020-07-07 13:00:00
Ann		2020-07-07 14:00:00
Frank	2020-07-07 15:00:00
Marry	2020-07-07 15:02:00
Gary	2020-07-07 15:04:00

The next pipeline run will contain the content of .test file and will look like this:

select * from db where date > '2020-07-07 17:00:00' order by date asc

And it will not retreive any data, as there are no records with dates greated than 2020-07-07 17:00:00 even though there are new records added to the table after the last run of our pipeline.

As a workaround we altered the statement, to offset those two hours:

select * from db where date > DATEADD(HOUR, -2, :sql_last_value) order by date asc

but it's not very flexible as there might be clients from other timezones as well.

Is there a better solution to handle that?

You might be able to set the timezone in the connection string.

Thank you for the suggestion. I run the test simulation of the jdbc with the following setup

ELK is in CEST timezone (UTC+2), database is in EEST timezone (UTC+3)

input {
	jdbc {
			jdbc_driver_library => "/drivers/sqljdbc42.jar"
			jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
			jdbc_connection_string => "jdbc:sqlserver://1.1.1.1;useTimezone=true;useLegacyDatetimeCode=false;serverTimezone=EEST"
			jdbc_user => "test"
			jdbc_password => "test"
			schedule => "*/1 * * * *"
			statement => "select CURRENT_TIMESTAMP as ts where CURRENT_TIMESTAMP > :sql_last_value"
			use_column_value => true
			tracking_column_type => "timestamp"
			tracking_column => "ts"
			last_run_metadata_path => "/last_run_metadata/.test"
			record_last_run => true
		}
}

filter {
}

output {
    stdout {}
}

Logstash output:

[INFO ] 2020-07-08 11:23:02.080 [Ruby-0-Thread-35: :1] jdbc - (0.121720s) select CURRENT_TIMESTAMP as ts where CURRENT_TIMESTAMP > '1970-01-01T00:00:00.000'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
"ts" => 2020-07-08T10:22:18.800Z,
"version" => "1",
"timestamp" => 2020-07-08T09:23:02.186Z
}
[INFO ] 2020-07-08 11:24:00.693 [Ruby-0-Thread-35: :1] jdbc - (0.066828s) select CURRENT_TIMESTAMP as ts where CURRENT_TIMESTAMP > '2020-07-08T10:22:18.800'
{
"ts" => 2020-07-08T10:23:17.473Z,
"version" => "1",
"timestamp" => 2020-07-08T09:24:00.709Z
}

Running the statement via SQL client:
select CURRENT_TIMESTAMP

2020-07-08 12:21:41.067

There is 3 hour difference between UTC and EEST. timestamp field is properly converted to UTC, but seems like ts is still off by an hour, which means that last_run_metadata file is also a bit off.

Try this into the query:

statement => "select (CURRENT_TIMESTAMP  AT TIME ZONE 'UTC')::timestamptz as ts where (CURRENT_TIMESTAMP AT TIME ZONE 'UTC')::timestamptz > :sql_last_value"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.