Logstash jdbc input jdbc_fetch_size not working making first load extremely slower

I am trying to pull all the data from oracle DB table for first load and then it would be delta based on tracking column and schedule.

Below is the jdbc conf in logstash:

input {
	jdbc { 
		jdbc_connection_string => "${oracle_jdbc_connection_string}"
		jdbc_user => "${oracle_jdbc_user}"
		jdbc_password => "${oracle_jdbc_password}"
		jdbc_driver_library => ""
		jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
		jdbc_fetch_size => 1000
        statement_filepath => "${logstash_project_path}"
		last_run_metadata_path => "${logstash_project_data_path}"
		use_column_value => true
		tracking_column => "${tracking_column}"
		tracking_column_type => "numeric"
		schedule => "* * * * * *"
		type => "${type}" 

I was able to pull and index a data set up to 25k in fewer mins but when trying full load of 4m records it takes more than 24 hrs to pull data from oracle table and indexing to ES takes 90 mins.

Did not realize if jdbc_fetch_size is working while i ran logstash for 25k records ? It seems jdbc_fetch_size has no impact and oracle DB tries to run one query to pull & prepare result set of 4m records which takes ~24 hrs.

What would be the alternative here to improve the performance of fetch? i faced almost similar issue with MySQL but i was getting OutOfMemory when executed logstash for 28m records load.

I fixed the issue with MySQL logstash by setting useCursorFetch=true in connection string. How the same i can achieve in case of Oracle.

I already tried jdbc_paging_enabled and fetch size in oracle connection string but nothing seems to help resolve the performance issue with DB fetch.

1 Like

Can any body help me here ?

how long does it take to get the 4m records if you run the query directly in the db?

fetch_size is used to determine number of rows to be fetched everytime the driver make database calls, then store them in memory. if you don’t limit the number of results, the database will still prepare all 4M records. it’s just that the driver will fetch 25k rows every time.

1 Like

I could fetch 100K in 30 mins , gets stuck for full 4M fetch from DB. My concern is jdbc_fetch_size it doesn't have any effect at all. I tried to limit the batch size in query itself by using ROWNUM but after fetching first batch, next batch stucks forever.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.