Hi,
I am working with Logstash and Elasticsearch but I have a problem. In my case I want Elasticsearch to be updated with the new data added into a SQL Database, all done via Logstash.
I set the configuration file as below:
input {
jdbc {
jdbc_connection_string => "JDBC-Connection-String"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_user => "JDBC-Connection-User"
jdbc_driver_library => "JDBC-Driver-Path"
statement => "SELECT MyCol1 MyCol2 FROM MyTable"
use_column_value => true
tracking_column => "MyCol1"
tracking_column_type => "numeric"
clean_run => true
schedule => "*/1 * * * *"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "MyIndex"
document_id => "%{MyCol1}"
}
stdout { }
}
As you can see, I run a SELECT into all the DB once a minute, than I add new lines found in the DB into Elasticsearch.
It works fine, but my problem is that every time logstash executes the query, it scans all the rows into the DB (I know that it is set to do that, but I haven't found another way) and then it adds in Elasticsearch the ones that are not there yet, obviously causing bad performances.
I am struggling to find a way that Logstash doesn't scans all the DB every time but simply adds the new rows into ElasticSearch.
P.S. MyCol1 is the Primary Key of the SQL table.