Logstash - using jdbc input plugin as input. And pipiline delay as 30 seconds and batch size as 1000. Still input plugin reads all data from database doesnt get impacted by pipeline configuration of size and delay

Lovin_Saini · October 5, 2023, 12:46pm

JDBC input plugin

input {
    jdbc {
          jdbc_driver_library => "/usr/share/logstash/driver/mysql-connector-java-8.0.32.jar"
          jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
          jdbc_connection_string => "${MYSQL_URL}?zeroDateTimeBehavior=convertToNull"
          last_run_metadata_path => "/tmp/sql_last_value_customer.yml"
          jdbc_user => "${MYSQL_USER}"
          jdbc_password => "${MYSQL_PASS}"
          statement => "SELECT cust.id AS 'customers.id', cust.name AS 'customers.name' FROM customers cust WHERE cust.created_at > :sql_last_value  ORDER BY cust.created_at limit :size offset :offset"
          use_column_value => true
          jdbc_paging_mode => "explicit"
          tracking_column => "customers.id"
          tracking_column_type => "numeric"
          record_last_run => true
          jdbc_paging_enabled => true
          jdbc_page_size => 500
          enable_metric => true
    }
}

Pipeline config:

pipeline.batch.size: 1000
pipeline.batch.delay: 30000

I expect jdbc input plugin to fetch 1000 records and wait for 30 sec then again fetch next 1000 records and wait for 30 sec goes on....

But it doesnt happen it fetch contineously all data from database doesnt wait for 30 secs.

PodarcisMuralis · October 5, 2023, 7:54pm

Hi.
I did not test it but firstly, using jdbc_page_size may cause this issue because it means plugin will be reading large numbers of rows from database.

I used also once this jdbc input plugin and as far as i remember i used this to control the delay.
schedule => "*/30 * * * *"

Lovin_Saini · October 6, 2023, 5:16am

Hi,
Seems like jdbc input plugin doesn't cater pipeline config. So even if pipeline config is there to control pipeline it can happen that this config gets ignored, all it depends on plugin used. This is my understanding.

I tried Schedule but it is usefull in case there is no pagination and the sql query is fetching latest data only if any. Because it re runs the plugin as per schedule so it doesn't go with pagination where there is a dynamic query with limit/offset.

Badger · October 6, 2023, 6:30pm

pipeline.batch.delay tells logstash how long to wait before flushing a batch to the pipeline if the batch has fewer than pipeline.batch.size events in it. If a batch reaches pipeline.batch.size then it will be flushed immediately.

So if your jdbc input is fetching 10500 rows I would expect 10 batches to be flushed immediately and the last 500 rows to follow 30 seconds later.

Lovin_Saini · October 18, 2023, 1:56pm

Thanks a lot for your reply. It is happening in similar way what you are explaining.

I wanted to control pipeline pace so that it won't hit database a lot. Currently database cpu is getting in higher limits whenever i run it and i am not seeing any ways to control logstash pipeline pace. Only thing is i can reduce worker count to 1 but then also if query is complex it will again causes database cpu to get higher.

As a solution i am planning to move to cron based prepared statement where i can specify frequency probably that can help.

system · November 15, 2023, 1:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JDBC input has irregular schedule intervals Logstash jdbc	6	43	August 9, 2024
Logstash jdbc plugin internals Logstash	5	521	July 6, 2020
Logstash jdbc input jdbc_fetch_size not working making first load extremely slower Logstash	4	2018	June 23, 2020
Logstash jdbc input plugin pagination not working on version 7.4.2 Logstash	1	1078	January 10, 2020
Logstash JDBC input plugin skips few scheduled runs Logstash jdbc	6	75	July 17, 2024

Logstash - using jdbc input plugin as input. And pipiline delay as 30 seconds and batch size as 1000. Still input plugin reads all data from database doesnt get impacted by pipeline configuration of size and delay

Related topics