Hello,
Our ELK Stack is placed in UTC timezone (OS timezone is set to UTC). The database from which we pull the data from using JDBC is in CET (UTC+2) timezone. The config file for the pipeline looks like this:
jdbc {
jdbc_driver_library => "/drivers/sqljdbc42.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://1.1.1.1;databaseName=test"
jdbc_user => "test"
jdbc_password => "test"
schedule => "*/5 * * * *"
statement => "select * from users where date > :sql_last_value order by date asc"
use_column_value => true
tracking_column_type => "timestamp"
tracking_column => "date"
last_run_metadata_path => "/last_run_metadata/.test"
record_last_run => true
}
Above pipeline is scheduled to run every 5 minutes.
Here's is the current data in users table:
user date
John 2020-07-07 13:00:00
Ann 2020-07-07 14:00:00
Frank 2020-07-07 15:00:00
Let's assume that it's this pipeline's first run, so the query executed by Logstash will look more or less like this:
select * from db where date > '1970-01-01 00:00:00' order by date asc
With above settings, Logstash will pull all 3 rows and create 3 json documents. Immediately after that, it will create /last_run_metadata/.test file with the last date it will pull from users table. It should be 2020-07-07 15:00:00 as this is the last date from the dataset, but due to timezone difference, Logstash will add two hour difference and the actual content of /last_run_metadata/.test will look like this:
--- 2020-07-07 17:00:00.000000000 Z
After our pipeline finished working, and before the next run, there are two rows added to users table so it looks like this:
user date
John 2020-07-07 13:00:00
Ann 2020-07-07 14:00:00
Frank 2020-07-07 15:00:00
Marry 2020-07-07 15:02:00
Gary 2020-07-07 15:04:00
The next pipeline run will contain the content of .test file and will look like this:
select * from db where date > '2020-07-07 17:00:00' order by date asc
And it will not retreive any data, as there are no records with dates greated than 2020-07-07 17:00:00 even though there are new records added to the table after the last run of our pipeline.
As a workaround we altered the statement, to offset those two hours:
select * from db where date > DATEADD(HOUR, -2, :sql_last_value) order by date asc
but it's not very flexible as there might be clients from other timezones as well.
Is there a better solution to handle that?