Unique entries in JDBC

Hello,

I'm trying to create an Elasticsearch index using JDBC plugin to read data from SQL Server. I have configured it to read my data at 2AM every day.

However, what happens if the system goes down at 2:05AM, having read only half the data, and restarts and hour later?
Does this mean I will have duplicate data for the first half (aka the data that was indexed between 2 and 2:05)?

What should I do to avoid a situation like this? I read a bit on states (sql_last_start) but I don't quite understand it.

Would adding a unique document ID be helpful?

This is my config file:

input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://mswpffussqua:2431;databaseName=abc"
jdbc_user => "abc"
jdbc_validate_connection => true
jdbc_driver_library => "/path/to/jar/file/sqljdbc4-4.0.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
statement => "select * from dbo.tb_Account"
schedule => "0 2 * * *"
}
}
output {
elasticsearch {
hosts=> ["host1","host2"]
index => "jdbc"
user => xxx
password => xxxx
ssl => true
ssl_certificate_verification => false
cacert => '/path/to/ca.crt'
}
stdout {}
}

Thank you in advanced!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.