Slow SQL with timestamp on Oracle

Wolfram_Haussig · October 30, 2024, 2:00pm

Hello all,

We have a Logstash Pipeline with jdbc input that ran really slow. I investigated and found something where I am not sure if this could be improved on Logstash side or if this is something we need to handle somehow.

I will try to be as detailed as possible.

Target DB: Oracle 19c
Table size: about 39mio entries (9GB)
Tracking Column definition: TIMESTAMP(6)
Index on Tracking Column: yes
Logstash input config (removed unnecessary information):

input {
    jdbc {
      jdbc_driver_library => "/opt/elastic/logstash/jdbc/ojdbc8_21.5/ojdbc8-21.5.0.0.jar"
      jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
      jdbc_validate_connection => true
      jdbc_fetch_size => 1000
      schedule => "* * * * *" 
      statement => "SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > :sql_last_value ORDER BY LASTWORKSTATECHANGE ASC"
      use_column_value => true
      tracking_column => "lastworkstatechange"
      tracking_column_type => "timestamp"
    }
}

The SQL logged into the pipeline log was:

SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > TIMESTAMP '2024-10-30 13:33:05.308000 +00:00' ORDER BY LASTWORKSTATECHANGE ASC

The SQL execution was between 46 and 217 seconds despite the index so I checked the explain plan:

You can see that the index cannot be used because Oracle uses functions on the column value.

As a test, I tried to modify the SQL so the timestamp function is done on the sql_last_value by changing the timestamp to string and back again - the final SQL is now:

SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > TO_TIMESTAMP(TO_CHAR(TIMESTAMP '2024-10-30 13:42:37.477000 +00:00', 'DD-Mon-RR HH24:MI:SS.FF'), 'DD-Mon-RR HH24:MI:SS.FF') ORDER BY LASTWORKSTATECHANGE ASC

Now the SQL completes in 2ms! The explain plan is now:

Has anyone an idea what I can do so I don't need that ugly workaround? Maybe the jdbc input can be improved here?

Best regards
Wolfram

eMitch · October 30, 2024, 4:58pm

I've only ever run into ugly workarounds when it comes to Logstash and :sql_last_value

I've also had issues where the last value went missing and then we're potentially loading much more data than expected.

What i always end up doing (in both MSSQL and Oracle) is to create a separate table to store the last value - a config table if you will.

Then I use a stored proc to run the query looking for the new data as well as updates the last value checked.

Still ugly? yea? but at least you maintain control in your db instead of a logstash file.

Wolfram_Haussig · October 31, 2024, 8:44am

Hello Eddie,

Glad that I am not the only one Yes, a stored procedure would work but I prefer to see what happens in a pipeline - especially if it is as simple as this case.

Let's see if someone from the Logstash team has input here.

Topic		Replies	Views
Logstash can not tracking oracle timestamp correctly Logstash	2	971	October 26, 2017
JDBC Input based on timestamp, concern regarding omitting an event Logstash	4	329	May 2, 2019
JDBC Plugin - sql_last_value equals a not existing timestamp Logstash	3	2696	September 28, 2017
Logstash 2.2 :sql_last_value Logstash	2	3539	July 6, 2017
How should I use sql_last_value in logstash? Logstash	11	27219	July 6, 2017

Slow SQL with timestamp on Oracle

Related topics