SQL to ElasticSearch, only new rows?


I'm relatively new to the ELK stack. I'm wondering how I could send data from a SQL table to Elasticsearch in a way that every time it runs, it only sends the rows that are new since the last time it ran. The SQL table contains Horizon View event logs.

Currently, for testing, I have setup JDBC connection which successfully gathers the content of the SQL table and sends it via Logstash to Elasticsearch. The problem with this is, it sends the entire table every time. So I'm looking for an Elasticsearch solution that either monitors the SQL table and sends information to Elasticsearch as rows get added to the table, or a way to prevent Logstash from sending the entire table every time, so I could even setup a scheduled transfer.
Maybe with beats?


Check out the tracking_column and use_column_value parameters.

If you have a DB with an incrementing field you can use that to do this. Say you have a field for id and it increments with new values for new data. The first time you run this it will grab all the data and then save the id on the local file system of the last one ran. Then when you run it again you would do where that id > last_value. See example.

I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/

Basically, I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.

Have also a look at this "live coding" recording.

But this supposes you have access to the application which writes the data to the database.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.