Unique entries in JDBC

kb2295 · January 8, 2019, 5:40pm

Hello,

I'm trying to create an Elasticsearch index using JDBC plugin to read data from SQL Server. I have configured it to read my data at 2AM every day.

However, what happens if the system goes down at 2:05AM, having read only half the data, and restarts and hour later?
Does this mean I will have duplicate data for the first half (aka the data that was indexed between 2 and 2:05)?

What should I do to avoid a situation like this? I read a bit on states (sql_last_start) but I don't quite understand it.

Would adding a unique document ID be helpful?

This is my config file:

input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://mswpffussqua:2431;databaseName=abc"
jdbc_user => "abc"
jdbc_validate_connection => true
jdbc_driver_library => "/path/to/jar/file/sqljdbc4-4.0.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
statement => "select * from dbo.tb_Account"
schedule => "0 2 * * *"
}
}
output {
elasticsearch {
hosts=> ["host1","host2"]
index => "jdbc"
user => xxx
password => xxxx
ssl => true
ssl_certificate_verification => false
cacert => '/path/to/ca.crt'
}
stdout {}
}

Thank you in advanced!

system · February 5, 2019, 5:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JDBC duplicate messages - Logstash still created it's own document_id after replaced with other value Logstash	1	1197	July 6, 2017
Duplicate entries into Elastic Search Logstash	8	2177	June 5, 2019
Adding extra data Logstash	8	605	September 7, 2018
Logstash JDBC Input Plugin for streaming data Logstash	4	2081	July 6, 2017
Jdbc input plugin read data multiple times from database Logstash	7	821	July 6, 2019

Unique entries in JDBC

Related topics