Slow SQL with timestamp on Oracle

Hello all,

We have a Logstash Pipeline with jdbc input that ran really slow. I investigated and found something where I am not sure if this could be improved on Logstash side or if this is something we need to handle somehow.

I will try to be as detailed as possible.

Target DB: Oracle 19c
Table size: about 39mio entries (9GB)
Tracking Column definition: TIMESTAMP(6)
Index on Tracking Column: yes
Logstash input config (removed unnecessary information):

input {
    jdbc {
      jdbc_driver_library => "/opt/elastic/logstash/jdbc/ojdbc8_21.5/ojdbc8-21.5.0.0.jar"
      jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
      jdbc_validate_connection => true
      jdbc_fetch_size => 1000
      schedule => "* * * * *" 
      statement => "SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > :sql_last_value ORDER BY LASTWORKSTATECHANGE ASC"
      use_column_value => true
      tracking_column => "lastworkstatechange"
      tracking_column_type => "timestamp"
    }
}

The SQL logged into the pipeline log was:

SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > TIMESTAMP '2024-10-30 13:33:05.308000 +00:00' ORDER BY LASTWORKSTATECHANGE ASC

The SQL execution was between 46 and 217 seconds despite the index so I checked the explain plan:

You can see that the index cannot be used because Oracle uses functions on the column value.

As a test, I tried to modify the SQL so the timestamp function is done on the sql_last_value by changing the timestamp to string and back again - the final SQL is now:

SELECT * FROM TRANSFER_ORDER WHERE LASTWORKSTATECHANGE > TO_TIMESTAMP(TO_CHAR(TIMESTAMP '2024-10-30 13:42:37.477000 +00:00', 'DD-Mon-RR HH24:MI:SS.FF'), 'DD-Mon-RR HH24:MI:SS.FF') ORDER BY LASTWORKSTATECHANGE ASC

Now the SQL completes in 2ms! The explain plan is now:

Has anyone an idea what I can do so I don't need that ugly workaround? Maybe the jdbc input can be improved here?

Best regards
Wolfram

I've only ever run into ugly workarounds when it comes to Logstash and :sql_last_value :person_shrugging:

I've also had issues where the last value went missing and then we're potentially loading much more data than expected.

What i always end up doing (in both MSSQL and Oracle) is to create a separate table to store the last value - a config table if you will.

Then I use a stored proc to run the query looking for the new data as well as updates the last value checked.

Still ugly? yea? but at least you maintain control in your db instead of a logstash file.

Hello Eddie,

Glad that I am not the only one :wink: Yes, a stored procedure would work but I prefer to see what happens in a pipeline - especially if it is as simple as this case.

Let's see if someone from the Logstash team has input here.