Logstash - using jdbc input plugin as input. And pipiline delay as 30 seconds and batch size as 1000. Still input plugin reads all data from database doesnt get impacted by pipeline configuration of size and delay

JDBC input plugin

input {
    jdbc {
          jdbc_driver_library => "/usr/share/logstash/driver/mysql-connector-java-8.0.32.jar"
          jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
          jdbc_connection_string => "${MYSQL_URL}?zeroDateTimeBehavior=convertToNull"
          last_run_metadata_path => "/tmp/sql_last_value_customer.yml"
          jdbc_user => "${MYSQL_USER}"
          jdbc_password => "${MYSQL_PASS}"
          statement => "SELECT cust.id AS 'customers.id', cust.name AS 'customers.name' FROM customers cust WHERE cust.created_at > :sql_last_value  ORDER BY cust.created_at limit :size offset :offset"
          use_column_value => true
          jdbc_paging_mode => "explicit"
          tracking_column => "customers.id"
          tracking_column_type => "numeric"
          record_last_run => true
          jdbc_paging_enabled => true
          jdbc_page_size => 500
          enable_metric => true

Pipeline config:

pipeline.batch.size: 1000
pipeline.batch.delay: 30000

I expect jdbc input plugin to fetch 1000 records and wait for 30 sec then again fetch next 1000 records and wait for 30 sec goes on....

But it doesnt happen it fetch contineously all data from database doesnt wait for 30 secs.

I did not test it but firstly, using jdbc_page_size may cause this issue because it means plugin will be reading large numbers of rows from database.

I used also once this jdbc input plugin and as far as i remember i used this to control the delay.
schedule => "*/30 * * * *"

Seems like jdbc input plugin doesn't cater pipeline config. So even if pipeline config is there to control pipeline it can happen that this config gets ignored, all it depends on plugin used. This is my understanding.

I tried Schedule but it is usefull in case there is no pagination and the sql query is fetching latest data only if any. Because it re runs the plugin as per schedule so it doesn't go with pagination where there is a dynamic query with limit/offset.

pipeline.batch.delay tells logstash how long to wait before flushing a batch to the pipeline if the batch has fewer than pipeline.batch.size events in it. If a batch reaches pipeline.batch.size then it will be flushed immediately.

So if your jdbc input is fetching 10500 rows I would expect 10 batches to be flushed immediately and the last 500 rows to follow 30 seconds later.

1 Like

Thanks a lot for your reply. It is happening in similar way what you are explaining.

I wanted to control pipeline pace so that it won't hit database a lot. Currently database cpu is getting in higher limits whenever i run it and i am not seeing any ways to control logstash pipeline pace. Only thing is i can reduce worker count to 1 but then also if query is complex it will again causes database cpu to get higher.

As a solution i am planning to move to cron based prepared statement where i can specify frequency probably that can help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.