Logstash jdbc input jdbc_fetch_size not working making first load extremely slower

MChat · April 29, 2020, 2:59am

I am trying to pull all the data from oracle DB table for first load and then it would be delta based on tracking column and schedule.

Below is the jdbc conf in logstash:


input {
	jdbc { 
		jdbc_connection_string => "${oracle_jdbc_connection_string}"
		jdbc_user => "${oracle_jdbc_user}"
		jdbc_password => "${oracle_jdbc_password}"
		jdbc_driver_library => ""
		jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
		jdbc_fetch_size => 1000
        statement_filepath => "${logstash_project_path}"
		last_run_metadata_path => "${logstash_project_data_path}"
		use_column_value => true
		tracking_column => "${tracking_column}"
		tracking_column_type => "numeric"
		schedule => "* * * * * *"
		type => "${type}" 
     }
}

I was able to pull and index a data set up to 25k in fewer mins but when trying full load of 4m records it takes more than 24 hrs to pull data from oracle table and indexing to ES takes 90 mins.

Did not realize if jdbc_fetch_size is working while i ran logstash for 25k records ? It seems jdbc_fetch_size has no impact and oracle DB tries to run one query to pull & prepare result set of 4m records which takes ~24 hrs.

What would be the alternative here to improve the performance of fetch? i faced almost similar issue with MySQL but i was getting OutOfMemory when executed logstash for 28m records load.

I fixed the issue with MySQL logstash by setting useCursorFetch=true in connection string. How the same i can achieve in case of Oracle.

I already tried jdbc_paging_enabled and fetch size in oracle connection string but nothing seems to help resolve the performance issue with DB fetch.

MChat · May 15, 2020, 11:01pm

Can any body help me here ?

ptamba · May 16, 2020, 6:57am

how long does it take to get the 4m records if you run the query directly in the db?

fetch_size is used to determine number of rows to be fetched everytime the driver make database calls, then store them in memory. if you don’t limit the number of results, the database will still prepare all 4M records. it’s just that the driver will fetch 25k rows every time.

MChat · May 26, 2020, 2:33pm

I could fetch 100K in 30 mins , gets stuck for full 4M fetch from DB. My concern is jdbc_fetch_size it doesn't have any effect at all. I tried to limit the batch size in query itself by using ROWNUM but after fetching first batch, next batch stucks forever.

system · June 23, 2020, 2:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is the jdbc fetch_size the same like Logstash batch_size? Logstash	1	1350	July 6, 2017
Logstash jdbc-input taking long time to load data from Oracle DataBase Logstash	22	4207	October 5, 2020
Logstash JDBC plugin Logstash	3	3468	June 12, 2017
Logstash JDBC input malfunction Logstash	2	868	July 6, 2017
JDBC Fetch size VS JDBC Page Size Logstash	1	2283	March 15, 2019

Logstash jdbc input jdbc_fetch_size not working making first load extremely slower

Related topics