JDBC logging duplicate row

LinhNT1 · November 16, 2021, 10:16am

Hi there,
I have a logtash with JDBC input like below

jdbc {
        jdbc_driver_library => "${LOGSTASH_JDBC_DRIVER_JAR_LOCATION}"
        jdbc_driver_class => "${LOGSTASH_JDBC_DRIVER}"
        jdbc_connection_string => "${LOGSTASH_JDBC_URL}"
        jdbc_user => "${LOGSTASH_JDBC_USERNAME}"
        jdbc_password => "${LOGSTASH_JDBC_PASSWORD}"
        jdbc_paging_enabled => true
        jdbc_page_size => 1000
        tracking_column => "unix_ts_in_secs"
        use_column_value => true
        last_run_metadata_path => "/usr/share/logstash/jdbc-last-run/.last_run"
        tracking_column_type => "numeric"
        schedule => "*/5 * * * * *"
        statement => "
            SELECT *, UNIX_TIMESTAMP(updated_at) AS unix_ts_in_secs FROM tbl WHERE updated_at > FROM_UNIXTIME(:sql_last_value) AND updated_at<= now() ORDER BY updated_at, id
          "
    }

Example: MySQL database have 1 row pass the condition of above sql statement, but the log print many row (2 or more) with the same data (duplicate), only @timestamp field is different.

Finally, only 1 row inserted to Elasticsearch, but I don't know why Logstash print many duplicate data (only @timestamp different). Is it affect performance or make high CPU/Memory usage?

Thank you!

Alex_Marquardt · November 16, 2021, 7:42pm

You could specify a document_id in the output of your pipeline to ensure that multiple copies of the same document overwrite a single document in Elasticsearch. You can find an example of this in the output of the example pipeline given in: How to keep Elasticsearch synced with a RDBMS using Logstash and JDBC | Elastic Blog.

LinhNT1 · November 17, 2021, 1:02am

Thank you, as I said, only one row to be insert/update to Elasticsearch (document_id is specified in output). My problem is I don't know why Logstash logging a jdbc input row many time with same data (only @timestamp different). Can you help me explain it?

LinhNT1 · November 22, 2021, 8:46am

Hi, finding anyone can help me.

system · December 20, 2021, 8:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting duplicate records with logstash JDBC plugin Logstash	4	3377	February 23, 2017
Duplicate entries when using jdbc pipeline Logstash elastic-stack-monitoring	5	513	April 5, 2019
Logstash produces duplicates Logstash	3	1157	July 6, 2017
Logstash and databse Logstash	16	5637	July 6, 2017
Duplication in logstash pipeline (input elasticsearch and output sql database) Logstash	3	226	June 20, 2023

JDBC logging duplicate row

Related topics