Logstash jdbc plugin internals

Hi,
I have some questions regarding the internals of logstash when using the jdbc input plugin.
I would like to mention that my current logstash configuration is working properly, but I want to know how the jdbc_page_size and the :sql_last_value work together...

Say, for instance, that in my logstash pipeline I have the following configuration:

statement => select * from table t where t.id > :sql_last_value order by t.id
jdbc_paging_enabled => true
jdbc_page_size => 10000
use_column_value => true
tracking_column => "id"

In the documentation it is stated that jdbc_paging_enabled causes the sql statement to be broken in multiple queries...
Say I have 100.000 records in the DB, how would logstash execute the statements?

1st:
select * from table t where t.id > 0 order by t.id FETCH NEXT 10000 ROWS ONLY;
(set the last id to :sql_last_value. eg. 10000)

2nd:
select * from table t where t.id > 10000 order by t.id FETCH NEXT 10000 ROWS ONLY;
(set the last id to :sql_last_value. eg. 20000)

3nd:
select * from table t where t.id > 20000 order by t.id FETCH NEXT 10000 ROWS ONLY;
(set the last id to :sql_last_value. eg. 30000)

etc

Is this accurate?

In the source code (https://github.com/chaodhib/logstash-input-jdbc/blob/4e8d1d7fbfd01fec5840530222f982b306e8bb22/lib/logstash/plugin_mixins/jdbc.rb) it is written: "This will cause a sql statement to be broken up into multiple queries. Each query will use limits and offsets to collectively retrieve the full result-set."
In Oracle there is no limit or offset so how does it work? It uses FETCH like I mentioned before or ROWNUM?

when is the value of :sql_last_value written to the metadata file?

Every time a statement is executed.

you mean the statement defined in the jdbc input plugin, so:

statement => select * from table t where t.id > :sql_last_value order by t.id

or after each one of the statements that are internally created from using jdbc_paging_enabled=true?

My reading is that paging is internal to the mixin statement handler, so it is once for the statement defined in the input option.

1 Like

Because I dont really know how the sql statement is perfomed under the hood, I ended up changing the query for something like:

statement => select * from (select * from table t where t.id > :sql_last_value order by t.id) where rownum < 250k

In this case the queries to the DB are much faster and I end not really needing jdbc_paging_enabled => true. I get less indexed documents per minute but its good enough.

Also, this way the metadata file is written after 250K indexed documents instead of after the entirety of the records have been indexed.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.