I am trying to pull all the data from oracle DB table for first load and then it would be delta based on tracking column and schedule.
Below is the jdbc conf in logstash:
input {
jdbc {
jdbc_connection_string => "${oracle_jdbc_connection_string}"
jdbc_user => "${oracle_jdbc_user}"
jdbc_password => "${oracle_jdbc_password}"
jdbc_driver_library => ""
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_fetch_size => 1000
statement_filepath => "${logstash_project_path}"
last_run_metadata_path => "${logstash_project_data_path}"
use_column_value => true
tracking_column => "${tracking_column}"
tracking_column_type => "numeric"
schedule => "* * * * * *"
type => "${type}"
}
}
I was able to pull and index a data set up to 25k in fewer mins but when trying full load of 4m records it takes more than 24 hrs to pull data from oracle table and indexing to ES takes 90 mins.
Did not realize if jdbc_fetch_size is working while i ran logstash for 25k records ? It seems jdbc_fetch_size has no impact and oracle DB tries to run one query to pull & prepare result set of 4m records which takes ~24 hrs.
What would be the alternative here to improve the performance of fetch? i faced almost similar issue with MySQL but i was getting OutOfMemory when executed logstash for 28m records load.
I fixed the issue with MySQL logstash by setting useCursorFetch=true in connection string. How the same i can achieve in case of Oracle.
I already tried jdbc_paging_enabled and fetch size in oracle connection string but nothing seems to help resolve the performance issue with DB fetch.