Jdbc and consistence of data

I am doing simple test.
connected to oracle db using jdbc and pulled two field name,date total 3000 record got pulled in to logstash. which are same when I check on oracle db.
ran that automatically few time as my config says to run it every 10 min and record stayed same (3000)

Now I deleted one record from that table on oracle. but logstash didn't remove that one record from it's log. how do I achieve this? that when information get change in oracle db it should pull that information at next round of run (which is every 10min)

But if I restart logstash.service new number reflects on my dashboard and get (2999)

here is my config file

input {
jdbc {
jdbc_validate_connection => true
jdbc_connection_string => "jdbc:oracle:thin:@oradev01:1521/CLIENT"
jdbc_user => "user"
jdbc_password => "user"
jdbc_driver_library => "/usr/lib/oracle/12.2/client64/lib/ojdbc8.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
statement => "SELECT name,entered from CLIENTS where entered > :sql_last_value"

    last_run_metadata_path => "/tmp/logstash-Clients.lastrun"
    use_column_value=>true
    tracking_column=>"entered"
    tracking_column_type=>"timestamp"
    clean_run=>true
    record_last_run => true
    schedule => "*/10 * * * *"
   }

}

output {
elasticsearch {
index => "clients-name" }
}

What are you attempting to accomplish? The JDBC input is working as configured:

  • The first time you run it, 3000 rows are emitted from the database, creating 3000 events in Logstash that are persisted in Elasticsearch as 3000 docs. The highest value seen for your tracking_column is saved to a file and is used in future requests as :sql_last_value.
  • The second time the schedule runs the job, the JDBC input uses your configured tracking_column and statement to only select data whose entered > :sql_last_value, which causes the DB to emit zero rows.
  • After you have deleted a row, the scheduler runs again, using your configured tracking_column to only select data whose entered > :sql_last_value, which causes the DB to emit zero rows.
  • When you restart Logstash, the input's clean_run => true directive causes it to ignore the cached tracking_column value, starting over from the beginning. Since there are 2999 rows in the database at this point, all 2999 are emitted.

oh so if I do not have clean_run=>true it will not start over and and I will still have 3000 rows. but infact I need 2999 as new row count because a row has been deleted. how do I achieve that ?

basically I want logstash to reflect what is in database. that is if something is deleted from db then delete from it's record. if something is added in to db then add that in to it's count.

Logstash has no way of tracking deleted rows.

I tested clean_run=>false and it is what I needed at this time. Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.