Confused about the behaviour of last_run_metadata_path in JDBC input

I am trying to get some data out of a SQL database.
Here is the configuration:

input
{
        jdbc
        {
                jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
                jdbc_connection_string => ""
                jdbc_user => ""
                jdbc_password => ""

                statement => "select TOP 500 a.ResultID, a.UnitIdentifier, b.Result from MyDB.Res a inner join MyDb.ResData b on a.ResultID=b.ResultID where a.ResultID > :sql_last_value and a.Code like '9840%' and a.CreatedOn > '2020-01-01' order by a.ResultID"

                tracking_column => "ResultID"
                use_column_value => true
                tracking_column_type => "numeric"
                schedule => "*/1 * * * *"
                jdbc_paging_enabled => "true"
                jdbc_default_timezone => "Australia/Sydney"
                jdbc_page_size => "500"
                last_run_metadata_path => "/Drive/logstash/logs/logstash_jdbc_last_run"
        }
}

Funny thing about ResultID is that it is a negative number. Why I do not know. But it keeps increasing in subsequent records.

The contents of the logstash_jdbc_last_run remains zero after I run it on terminal.
--- 0

In the logs I see
where a.ResultID > 0 in repeated runs. Should it not be the value of last run instead of 0?
Logstash version is : 7.7.0

If ResultID is negative then I would expect your SQL to test 'where a.ResultID < :sql_last_value' (< instead of >). I believe 0 is the default initial value when you use a numeric tracking column.

@Badger They started the numbering of the primary field from maximum negative so as to get maximum range.

The rough representation is:
-11, -10, -9, -8, -7

If I keep a.ResultID < :sql_last_value, then at end of first run, :sql_last_value will have value -7.
Then the second run will also ingest the first 4 elements and :sql_last_value will have value -8.

Two things I have noticed:

  1. a.ResultID > :sql_last_value leads to this in logs where a.ResultID > 0. It remains unchanged between the queries.

  2. I tried to get around it by creating an alias.

    statement => "select a.ResultID, a.ResultID * (-1) as resid , a.UnitIdentifier, b.Result from MyDB.Res a inner join MyDb.ResData b on a.ResultID=b.ResultID where a.ResultID > :sql_last_value and a.Code like '9840%' and a.CreatedOn > '2020-01-01' order by resid ASC"
    tracking_column => "resid"
    use_column_value => true
    tracking_column_type => "numeric"

This gives error:
Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::ComMicrosoftSqlserverJdbc::SQLServerException: Invalid column name 'resid'.

The query runs fine in my Microsoft SQL Server Management Studio.

EDIT 1
Just read through this.
Bottomline: column_alias can be used in an ORDER BY clause, but it cannot be used in a WHERE, GROUP BY, or HAVING clause .

EDIT 2
I am trying out this logic:
statement => "select a.ResultID , a.UnitIdentifier, b.Result from MyDB.Res a inner join MyDb.ResData b on a.ResultID=b.ResultID where a.ResultID < :sql_last_value and a.Code like '9840%' and a.CreatedOn > '2020-01-01' order by a.ResultID DESC"

This means that the data is pulled in reverse chronological order:
Dec, Nov, Oct, Sept......

This is fine by me. As long as the whole data comes in it is fine.
But I am concerned if I have turned the logic of incremental pulling of future data from database on it head.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.