About logstash jdbc input


(supermario) #1

I have a mysql database . A big table have 2000 billion rows records.
I create a config file and use jdbc input, content:

input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/jars/mysql-connector-java-5.1.47.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://...:3306/mydatabase"
jdbc_user => "root"
jdbc_password => "pawwrod"
use_column_value => true
tracking_column=> "ID"
tracking_column_type=> "numeric"
jdbc_paging_enabled => true
statement => "SELECT * from mytable where ID >:sql_last_value "
}
}

I know ,not set schedule , then the statement is run exactly once.it only init my data.

My question is when this job is finish. I modify config,set schedule '0 0 */1 * *'(one day execute one times ). Whether Last_value from the last execution.

My goal is to initialize the data.Then perform the task regularly.


(Guy Boertje) #2

You probably want to set last_run_metadata_path to a file you are in control of, this file will hold the ID last used as sql_last_value.

2000 billion rows is a lot to process - it will take many hours and things can go wrong.

It may be better to run, say, 10 logstash instances with each one loading a smaller subset of the IDs.

Each LS instance will still write the sql_last_value to the file, so if say on LS 2 that is doing where ID >= 1000000 AND ID < 2000000 something goes wrong after doing ID of 1234567 then you can change the statement to where ID > 1234567 AND ID < 2000000 and so on.

When those 10 are done then change the values in each statement to the next set.


(supermario) #3

thanks your reply, sql_last_value parameter in the form of a metadata file stored in the configured `last_run_metadata_path(quote logstash documention). so,i can set last_run_metadata_path to max id,when the init job finish?


(Guy Boertje) #4

Yes.


(supermario) #5

thanks