Hi there,
I have a logtash with JDBC input like below
jdbc {
jdbc_driver_library => "${LOGSTASH_JDBC_DRIVER_JAR_LOCATION}"
jdbc_driver_class => "${LOGSTASH_JDBC_DRIVER}"
jdbc_connection_string => "${LOGSTASH_JDBC_URL}"
jdbc_user => "${LOGSTASH_JDBC_USERNAME}"
jdbc_password => "${LOGSTASH_JDBC_PASSWORD}"
jdbc_paging_enabled => true
jdbc_page_size => 1000
tracking_column => "unix_ts_in_secs"
use_column_value => true
last_run_metadata_path => "/usr/share/logstash/jdbc-last-run/.last_run"
tracking_column_type => "numeric"
schedule => "*/5 * * * * *"
statement => "
SELECT *, UNIX_TIMESTAMP(updated_at) AS unix_ts_in_secs FROM tbl WHERE updated_at > FROM_UNIXTIME(:sql_last_value) AND updated_at<= now() ORDER BY updated_at, id
"
}
Example: MySQL database have 1 row pass the condition of above sql statement, but the log print many row (2 or more) with the same data (duplicate), only @timestamp field is different.
Finally, only 1 row inserted to Elasticsearch, but I don't know why Logstash print many duplicate data (only @timestamp different). Is it affect performance or make high CPU/Memory usage?
Thank you!