How should I use sql_last_value in logstash?

I'm quite unclear of what sql_last_value does when I give my statement as such:

statement => "SELECT * from mytable where id > :sql_last_value"

I can slightly understand the reason behind using it, where it doesn't browse through the whole db table in order to update fields instead it only updates the records which were added newly. Correct me if I'm wrong.

So what I'm trying to do is, creating the index using logstash as such:

input {
    jdbc {
        jdbc_connection_string => "jdbc:mysql://hostmachine:3306/db" 
        jdbc_user => "root"
        jdbc_password => "root"
        jdbc_validate_connection => true
        jdbc_driver_library => "/path/mysql_jar/mysql-connector-java-5.1.39-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        schedule => "* * * * *"
		statement => "SELECT * from mytable where id > :sql_last_value"
   		use_column_value => true
		tracking_column => id
		jdbc_paging_enabled => "true"
		jdbc_page_size => "50000"
    }
}

output {
    elasticsearch {
        #protocol => http
        index => "myindex"
        document_type => "message_logs"
        document_id => "%{id}"
		action => index
		hosts => ["http://myhostmachine:9402"]
    }
}

Once I do this, the docs aren't getting uploaded at all to the index. Where am I going wrong?

Any help could be appreciated.

Simplify things by removing the elasticsearch configuration for now. Use a stdout { codec => rubydebug } output. What is the plugin doing? Crank up Logstash's loglevel to find out more.

1 Like

@magnusbaeck Yes I did what you asked me to do. Inserted the codec and checked with the debug mode as well.

Part of the output:

[2016-11-02T16:52:00,276][INFO ][logstash.inputs.jdbc ] (0.002000s) SELECT count() AS count FROM (SELECT * from TEST where id > '2016-11-02 11:21:00') AS t1 LIMIT 1
[2016-11-02T16:52:00,279][DEBUG][logstash.inputs.jdbc ] Executing JDBC query {:statement=>"SELECT * from TEST where id > :sql_last_value", :parameters=>{:sql_last_value=>2016-11-02 11:21:00 UTC}, :count=>0}
[2016-11-02T16:52:00,287][INFO ][logstash.inputs.jdbc ] (0.003000s) SELECT count(
) AS count FROM (SELECT * from TEST where id > '2016-11-02 11:21:00') AS t1 LIMIT 1
[2016-11-02T16:52:00,582][DEBUG][logstash.pipeline ] Pushing flush onto pipeline

What should I be checking on with the output ? I'm cracking my head with this. :confused:

Can't I use an id of a table in order to update the index with the new records which are added? I tried it with the date and datetime field and it works perfectly fine. But then I have to work around with the id.

I suspect you've run the plugin at least once before setting tracking_column, so Logstash has saved a timestamp in ~/.logstash_jdbc_last_run but never updates that file since it never gets any results for the resulting query. Try deleting the file or changing it to the id where you want to start.

1 Like

@magnusbaeck I deleted the .logstash_jdbc_last_run file and changed it to a value as 0 so that it could pickup for the id field changes from the database records. But still no use.

When I ran the logstash conf, the .logstash_jdbc_last_run is somehow having a timestamp value. I can't even imagine where is it picking up a timestamp from even after deleting it or even changing it to zero.

Is there any jdbc property, which I'm missing above?

Thanks.

Okay. Then I'm not sure what's going on. I've never used the jdbc input myself.

1 Like

@magnusbaeck Oh alright. Is there anything I could just follow up on this thing?

Or any other recommended source to refer. :slight_smile:

Thanks.

Somehow made it work. For those whom it might help this was my final jdbc input in logstash:

    jdbc {
            jdbc_connection_string => "jdbc:mysql://myhostmachine:3306/mydb" 
            jdbc_user => "root"
            jdbc_password => "root"
            jdbc_validate_connection => true
            jdbc_driver_library => "/mypath/mysql-connector-java-5.1.39-bin.jar"
            jdbc_driver_class => "com.mysql.jdbc.Driver"
    	    jdbc_paging_enabled => "true"
    	    jdbc_page_size => "50000"
    	    schedule => "* * * * *"
    	    statement => "SELECT * from mytable where id > :sql_last_value"
    	    use_column_value => true
            tracking_column => "id"
    	    tracking_column_type => "numeric"
    	    clean_run => true 
    	    last_run_metadata_path => "/path/.logstash_jdbc_last_run"
        }

Make sure you delete the .logstash_jdbc_last_run file before you run your logstash conf and have a different last_run_metadata_path if you're running multiple logstash instances.

4 Likes

I am getting error "Unknown setting 'tracking_column_type' for jdbc {:level=>:error}"

@hearvishwas, please consult the documentation for your version of Logstash (the option might not be available until in later releases) and start a new thread if you have follow-up questions.

Hi @magnusbaeck Magnus Bäck
I am facing same issue which @Kulasangar_Gowrisang Kulasangar Gowrisangar is.
I am able to run logstash with jdbc when I remove tracking_column_type => "numeric", but this time logstash is reading old records, so ES is getting filled up with duplicate records. By the way its not tracking_column_type, its just "type"