Auto-update Logstash on Insert on SQL Server

Hi,
I am working with Logstash and Elasticsearch but I have a problem. In my case I want Elasticsearch to be updated with the new data added into a SQL Database, all done via Logstash.
I set the configuration file as below:

input {
    jdbc {
        jdbc_connection_string => "JDBC-Connection-String"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
        jdbc_user => "JDBC-Connection-User"
        jdbc_driver_library => "JDBC-Driver-Path"

        statement => "SELECT MyCol1 MyCol2 FROM MyTable"
        use_column_value => true
        tracking_column => "MyCol1"
        tracking_column_type => "numeric"
        clean_run => true
        schedule => "*/1 * * * *"
    }
}

output {
    elasticsearch {
    hosts => "http://localhost:9200"
    index => "MyIndex"
    document_id => "%{MyCol1}"
}
    stdout { }
}

As you can see, I run a SELECT into all the DB once a minute, than I add new lines found in the DB into Elasticsearch.
It works fine, but my problem is that every time logstash executes the query, it scans all the rows into the DB (I know that it is set to do that, but I haven't found another way) and then it adds in Elasticsearch the ones that are not there yet, obviously causing bad performances.
I am struggling to find a way that Logstash doesn't scans all the DB every time but simply adds the new rows into ElasticSearch.
P.S. MyCol1 is the Primary Key of the SQL table.

If you have either a timestamp or a sequence that allows you to add a WHERE clause that identifies new entries then you can persist state about which rows have been read. But if not, you have to read the whole table every time.

1 Like

Thank you so much for your response.
For those who have the same problem as me, I added this line to my configuration file:

last_run_metadata_path => "data\.logstash_jdbc_last_run"

Then I modified the query statement as it follows:

statement => "SELECT MyCol1 MyCol2 FROM MyTable WHERE MyCol1 > :sql_last_value"

As you can see, :sql_last_value is where I get the last value read from MyCol1, which is stored into "data\.logstash_jdbc_last_run"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.