Hi all,
I am using the jdbc plugin with LS 7.13.2 to retrieve records from an Oracle DB. When users make changes to their info, I record the timestamp of the change in the dateChanged column of the DB. Through the jdbc plugin, I schedule a minute-wise scrape of the DB, retrieving all records that were changed since the timestamp of the last run of the pipeline. I make a note of which records by ID were updated / inserted in the index via an output write to file. Here, my configuration:
input {
jdbc {
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_driver_library => "/usr/local/bin/ojdbc8.jar"
jdbc_connection_string => "jdbc:oracle:thin:@<connectionstring>"
jdbc_default_timezone => "Europe/Berlin"
jdbc_user => "user"
jdbc_password => "pass"
statement_filepath => "/path/to/update.sql"
schedule => "* * * * *"
last_run_metadata_path => "/path/to/.logstash_jdbc_update"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
doc_as_upsert => true
action => "update"
index => "myIndex"
document_id => "%{_id}"
}
}
output {
file {
path => "/path/to/log_file"
codec => line { format => "[ %{@timestamp}, %{date_changed} ] Updating / Inserting Document ID: %{_id}"}
}
}
My sql command incorporates the :sql_last_value timestamp as follows:
SELECT *
WHERE ((state = 'A')
AND (
TO_TIMESTAMP_TZ(TO_CHAR(dateChanged, 'DD-MM-YYYY hh24:mi:ss'), 'DD-MM-YYYY hh24:mi:ss') AT TIME ZONE 'Europe/Berlin' >= TO_TIMESTAMP_TZ(FROM_TZ(CAST (:sql_last_value AS TIMESTAMP), 'UTC')) AT TIME ZONE 'Europe/Berlin'
))
After many trials, I found the long to_timestamp_tz clause ensures that my dateChanged timestamp and :sql_last_value are in the same timezone.
The query returns correct results, the pipeline injects them into my index no problem. However, at some point, the jdbc timestamp stops refreshing. This causes the scheduled pipeline to run with the same timestamp every minute! And while the timestamp doesn't change, users continue to use my application and change their information, which means my query results continue to grow and grow.
I see that the timestamp isn't changing reflected in my log files, where I see the same record IDs continuously grabbed every iteration.
My question: Why isn't the timestamp refreshing? I was under the impression it refreshes after every query, and I've not seen any SQL errors show up in any of my logs either. It's very strange behavior and I can find no sense behind it.
If it's important information, this pipeline is wrapped up within a pipelines.yml file that defines two other pipelines. One of which functions essentially the same way for records that should be removed from the index, and the other a simple beats config.
- pipeline.id: upsert
path.config: "path/to/update.conf"
- pipeline.id: delete
path.config: "path/to/delete.conf"
- pipeline.id: winbeats
path.config: "path/to/winbeats.conf"