I'm relatively new to the ELK stack. I'm wondering how I could send data from a SQL table to Elasticsearch in a way that every time it runs, it only sends the rows that are new since the last time it ran. The SQL table contains Horizon View event logs.
Currently, for testing, I have setup JDBC connection which successfully gathers the content of the SQL table and sends it via Logstash to Elasticsearch. The problem with this is, it sends the entire table every time. So I'm looking for an Elasticsearch solution that either monitors the SQL table and sends information to Elasticsearch as rows get added to the table, or a way to prevent Logstash from sending the entire table every time, so I could even setup a scheduled transfer.
Maybe with beats?
Check out the tracking_column and use_column_valueparameters.
If you have a DB with an incrementing field you can use that to do this. Say you have a field for id and it increments with new values for new data. The first time you run this it will grab all the data and then save the id on the local file system of the last one ran. Then when you run it again you would do where that id > last_value. See example.
Basically, I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.