Auto-update Logstash on Insert on SQL Server

sam.cook · October 21, 2019, 4:15pm

Hi,
I am working with Logstash and Elasticsearch but I have a problem. In my case I want Elasticsearch to be updated with the new data added into a SQL Database, all done via Logstash.
I set the configuration file as below:

input {
    jdbc {
        jdbc_connection_string => "JDBC-Connection-String"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
        jdbc_user => "JDBC-Connection-User"
        jdbc_driver_library => "JDBC-Driver-Path"

        statement => "SELECT MyCol1 MyCol2 FROM MyTable"
        use_column_value => true
        tracking_column => "MyCol1"
        tracking_column_type => "numeric"
        clean_run => true
        schedule => "*/1 * * * *"
    }
}

output {
    elasticsearch {
    hosts => "http://localhost:9200"
    index => "MyIndex"
    document_id => "%{MyCol1}"
}
    stdout { }
}

As you can see, I run a SELECT into all the DB once a minute, than I add new lines found in the DB into Elasticsearch.
It works fine, but my problem is that every time logstash executes the query, it scans all the rows into the DB (I know that it is set to do that, but I haven't found another way) and then it adds in Elasticsearch the ones that are not there yet, obviously causing bad performances.
I am struggling to find a way that Logstash doesn't scans all the DB every time but simply adds the new rows into ElasticSearch.
P.S. MyCol1 is the Primary Key of the SQL table.

Badger · October 21, 2019, 4:32pm

If you have either a timestamp or a sequence that allows you to add a WHERE clause that identifies new entries then you can persist state about which rows have been read. But if not, you have to read the whole table every time.

sam.cook · October 22, 2019, 7:40am

Thank you so much for your response.
For those who have the same problem as me, I added this line to my configuration file:

last_run_metadata_path => "data\.logstash_jdbc_last_run"

Then I modified the query statement as it follows:

statement => "SELECT MyCol1 MyCol2 FROM MyTable WHERE MyCol1 > :sql_last_value"

As you can see, :sql_last_value is where I get the last value read from MyCol1, which is stored into "data\.logstash_jdbc_last_run"

system · November 19, 2019, 7:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash to Elasticsearch - data update Logstash	1	220	October 23, 2019
Only import new or updated rows using JDBC Logstash	2	1560	March 23, 2017
Logstash update my existing Elasticsearch index data Logstash	1	271	October 14, 2019
Auto-run Logstash on Insert on SQL Server Logstash	1	271	June 5, 2020
Elasticsearch - Logstash data update issue Logstash	2	367	October 9, 2019

Auto-update Logstash on Insert on SQL Server

Related topics