JDBC Timestamp doesn't refresh, leads to large queues

Hi all,

I am using the jdbc plugin with LS 7.13.2 to retrieve records from an Oracle DB. When users make changes to their info, I record the timestamp of the change in the dateChanged column of the DB. Through the jdbc plugin, I schedule a minute-wise scrape of the DB, retrieving all records that were changed since the timestamp of the last run of the pipeline. I make a note of which records by ID were updated / inserted in the index via an output write to file. Here, my configuration:

input {
        jdbc {
                jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
                jdbc_driver_library => "/usr/local/bin/ojdbc8.jar"
                jdbc_connection_string => "jdbc:oracle:thin:@<connectionstring>"
                jdbc_default_timezone => "Europe/Berlin"
                jdbc_user => "user"
                jdbc_password => "pass"
                statement_filepath => "/path/to/update.sql"
                schedule => "* * * * *"
                last_run_metadata_path => "/path/to/.logstash_jdbc_update"

output {
 elasticsearch {
        hosts => ["localhost:9200"]
        doc_as_upsert => true
        action => "update"
        index => "myIndex"
        document_id => "%{_id}"

output {
        file {
                path => "/path/to/log_file"
                codec => line { format => "[ %{@timestamp}, %{date_changed} ] Updating / Inserting Document ID: %{_id}"}

My sql command incorporates the :sql_last_value timestamp as follows:

   WHERE ((state = 'A')
   AND (
   TO_TIMESTAMP_TZ(TO_CHAR(dateChanged, 'DD-MM-YYYY hh24:mi:ss'), 'DD-MM-YYYY hh24:mi:ss') AT TIME ZONE 'Europe/Berlin' >= TO_TIMESTAMP_TZ(FROM_TZ(CAST (:sql_last_value AS TIMESTAMP), 'UTC')) AT TIME ZONE 'Europe/Berlin'

After many trials, I found the long to_timestamp_tz clause ensures that my dateChanged timestamp and :sql_last_value are in the same timezone.

The query returns correct results, the pipeline injects them into my index no problem. However, at some point, the jdbc timestamp stops refreshing. This causes the scheduled pipeline to run with the same timestamp every minute! And while the timestamp doesn't change, users continue to use my application and change their information, which means my query results continue to grow and grow.

I see that the timestamp isn't changing reflected in my log files, where I see the same record IDs continuously grabbed every iteration.

My question: Why isn't the timestamp refreshing? I was under the impression it refreshes after every query, and I've not seen any SQL errors show up in any of my logs either. It's very strange behavior and I can find no sense behind it.

If it's important information, this pipeline is wrapped up within a pipelines.yml file that defines two other pipelines. One of which functions essentially the same way for records that should be removed from the index, and the other a simple beats config.

- pipeline.id: upsert
  path.config: "path/to/update.conf"
- pipeline.id: delete
  path.config: "path/to/delete.conf"
- pipeline.id: winbeats
  path.config: "path/to/winbeats.conf"

I ever had the same problem,
does the logstash service have the permission to update the metadata?

It does indeed. If I look back into the logs, I see the :sql_last_value updating as the job gets run. Then seemingly out of nowhere, the timestamp fails to refresh and after that point the logs have the same timestamp in place of :sql_last_value.

The pipelines are wrapped up in a service, so I'll investigate those logs to see if it is logging any errors. I know I've received Rufus scheduler errors before due to timestamps being expected in a certain format and the scheduler being unable to run, so this could be the problem.

Hi all,

To anyone having this issue, I may have found the problem, though I don't understand the underlying logic behind it.

In any case, what you'll notice in my configuration is an output plugin that writes to file information about the document that was updated / inserted. For privacy reasons, I anonymized the path to my log file. The path declared the same log file as the pipeline. I did this to consolidate information about the pipeline, to include the document information with the pipeline information, but it seems during heavy loads of documents, this may be causing problems with the pipelines ability to update the timestamp. Perhaps because the output plugin refused to let the file be closed by the pipeline, it prevent the pipeline from writing jdbc metadata to another file? I'm not sure.

I've since changed that path to a different log file and my issue has seemingly vanished. I've also attempted to blast my pipeline with thousands of documents to refresh, and it took it like a champ. For this reason, I believe that writing to the same file as the pipeline was the issue.

TL;DR: Don't have an output plugin that writes to the same log file as the pipeline itself. While it may work for low numbers of documents, during high loads it seems to prevent the pipeline from functioning correctly.

1 Like

An unfortunate hello again all,

The problem has managed to happen again, leaving me once more speechless and clueless as to what is causing it. I thought perhaps it's a filter problem in my pipeline, but when I run a full build grabbing all of the documents from the database, the pipeline processes the records in the same fashion as the update pipeline. After the full build, all records are in the index, so there's no issue with a single document plugging the pipeline somehow.

I've looked around at what else could be happening, but I think since my pipeline otherwise runs without any issue, the issue must lie elsewhere, but where I have not a clue.

To anyone who is interested in the topic, I've localized the failure to update the jdbc timestamps to a Rufus Scheduler Error. I've opened a new topic here.