How does JDBC input plugin keeps timestamp last value?

Hello

I just want to figure it out how does JDBC input plugin writes the state value to the last_run_metadata_path file.

Let me explain:

  • We have a SQL Server database and we are ingesting a table that now has a "last_update" column of DATETIME type
  • This type does not store TZ informantion
  • We are in UTC -03:00
  • We have a production and a test environment, both with its own Logstash VM and SQL Server database
  • We've tried to keep them both as equal as possible, bute there might have differences in the database configuration, or maybe something on the Logstash VM side (but not the pipeline itself)
  • I've wanted to use jdbc_default_timezone => "America/Sao_Paulo", but couldn't because of this:
    https://github.com/logstash-plugins/logstash-input-jdbc/issues/121
    TZInfo::AmbiguousTime exception when parsing JDBC column

Here is my pipeline configuration:

input {
  jdbc {
    ...
    #jdbc_default_timezone => "America/Sao_Paulo"
    use_column_value => true
    tracking_column => "last_updated" # last_updated_date
    tracking_column_type => "timestamp"
    last_run_metadata_path => "/etc/logstash/.last_updated"
    statement => "
    	SELECT	...
    	FROM	my_table
    	WHERE
	    -- Environment: Test
	    -- Converts 'last_updated' to UTC so it is comparable to sql_last_value
	    -- (which in the Test environment is kept in UTC by Logstash)
	    --dateadd(hh, datediff(hh, getdate(), getutcdate()), last_updated) > :sql_last_value
	 
	    -- Environment: Production
	    -- Somehow in this environment Logstash keeps sql_last_value in local time (GMT -03:00)
	    -- so we should not convert 'last_updated' to UTC
	    last_updated > :sql_last_value
	  ORDER BY last_updated asc"

As you can see in the comments, I had an important unexpected behavior between both environments.

Since we are 3 hours behind UTC, when I first ran the pipeline version in the Production environment with the dateadd() conversion Logstash started ingesting the whole table, because every local time (table column) converted to UTC (add 3 hours) is greater than local time.

Here's how the state is being kept in the Test environemnt:

--- 2020-08-17 13:23:00.000000000 Z

And here's how the state is being kept in the Production environemnt:

--- 2020-08-17 10:23:00.000000000 -03:00

I've tried to take a look at the plugin's code and have found this:

The parameter jdbc_default_timezone is not set in both environments.

So here is my question:

What made the plugin behave different in both environments?

Thank you for your help.

Possibly one of the processes has TZ in the environment and the other does not? It is really a ruby question.

Just confirmed that there is no $TZ set in both environments...

But it definetly looks like Ruby is getting/not getting something from the environment. I don't acctualy have a problem with this anymore, since I've adjusted my WHERE clause, but it'd be nice to find out why it did happen, since it took us by surprise when we deployed the new pipeline in the Production env after a week of tests in the Test environment...

Anyway, lesson learned: always rubydebug first, even if tested.

Thanks for your reply.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.