JDBC logging duplicate row

Hi there,
I have a logtash with JDBC input like below

jdbc {
        jdbc_driver_library => "${LOGSTASH_JDBC_DRIVER_JAR_LOCATION}"
        jdbc_driver_class => "${LOGSTASH_JDBC_DRIVER}"
        jdbc_connection_string => "${LOGSTASH_JDBC_URL}"
        jdbc_user => "${LOGSTASH_JDBC_USERNAME}"
        jdbc_password => "${LOGSTASH_JDBC_PASSWORD}"
        jdbc_paging_enabled => true
        jdbc_page_size => 1000
        tracking_column => "unix_ts_in_secs"
        use_column_value => true
        last_run_metadata_path => "/usr/share/logstash/jdbc-last-run/.last_run"
        tracking_column_type => "numeric"
        schedule => "*/5 * * * * *"
        statement => "
            SELECT *, UNIX_TIMESTAMP(updated_at) AS unix_ts_in_secs FROM tbl WHERE updated_at > FROM_UNIXTIME(:sql_last_value) AND updated_at<= now() ORDER BY updated_at, id
          "
    }

Example: MySQL database have 1 row pass the condition of above sql statement, but the log print many row (2 or more) with the same data (duplicate), only @timestamp field is different.

Finally, only 1 row inserted to Elasticsearch, but I don't know why Logstash print many duplicate data (only @timestamp different). Is it affect performance or make high CPU/Memory usage?

Thank you!

You could specify a document_id in the output of your pipeline to ensure that multiple copies of the same document overwrite a single document in Elasticsearch. You can find an example of this in the output of the example pipeline given in: How to keep Elasticsearch synced with a RDBMS using Logstash and JDBC | Elastic Blog.

1 Like

Thank you, as I said, only one row to be insert/update to Elasticsearch (document_id is specified in output). My problem is I don't know why Logstash logging a jdbc input row many time with same data (only @timestamp different). Can you help me explain it?

Hi, finding anyone can help me.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.