I know ,not set schedule , then the statement is run exactly once.it only init my data.
My question is when this job is finish. I modify config,set schedule '0 0 */1 * *'(one day execute one times ). Whether Last_value from the last execution.
My goal is to initialize the data.Then perform the task regularly.
You probably want to set last_run_metadata_path to a file you are in control of, this file will hold the ID last used as sql_last_value.
2000 billion rows is a lot to process - it will take many hours and things can go wrong.
It may be better to run, say, 10 logstash instances with each one loading a smaller subset of the IDs.
Each LS instance will still write the sql_last_value to the file, so if say on LS 2 that is doing where ID >= 1000000 AND ID < 2000000 something goes wrong after doing ID of 1234567 then you can change the statement to where ID > 1234567 AND ID < 2000000 and so on.
When those 10 are done then change the values in each statement to the next set.
thanks your reply, sql_last_value parameter in the form of a metadata file stored in the configured `last_run_metadata_path(quote logstash documention). so,i can set last_run_metadata_path to max id,when the init job finish?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.