Hello all,
We have a Logstash Pipeline with jdbc
input that ran really slow. I investigated and found something where I am not sure if this could be improved on Logstash side or if this is something we need to handle somehow.
I will try to be as detailed as possible.
Target DB: Oracle 19c
Table size: about 39mio entries (9GB)
Tracking Column definition: TIMESTAMP(6)
Index on Tracking Column: yes
Logstash input config (removed unnecessary information):
input {
jdbc {
jdbc_driver_library => "/opt/elastic/logstash/jdbc/ojdbc8_21.5/ojdbc8-21.5.0.0.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_validate_connection => true
jdbc_fetch_size => 1000
schedule => "* * * * *"
statement => "SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > :sql_last_value ORDER BY LASTWORKSTATECHANGE ASC"
use_column_value => true
tracking_column => "lastworkstatechange"
tracking_column_type => "timestamp"
}
}
The SQL logged into the pipeline log was:
SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > TIMESTAMP '2024-10-30 13:33:05.308000 +00:00' ORDER BY LASTWORKSTATECHANGE ASC
The SQL execution was between 46 and 217 seconds despite the index so I checked the explain plan:
You can see that the index cannot be used because Oracle uses functions on the column value.
As a test, I tried to modify the SQL so the timestamp function is done on the sql_last_value
by changing the timestamp to string and back again - the final SQL is now:
SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > TO_TIMESTAMP(TO_CHAR(TIMESTAMP '2024-10-30 13:42:37.477000 +00:00', 'DD-Mon-RR HH24:MI:SS.FF'), 'DD-Mon-RR HH24:MI:SS.FF') ORDER BY LASTWORKSTATECHANGE ASC
Now the SQL completes in 2ms! The explain plan is now:
Has anyone an idea what I can do so I don't need that ugly workaround? Maybe the jdbc
input can be improved here?
Best regards
Wolfram