I'm relatively new to Logstash, and even though I've browsed the web for a few days now, I can not seem to find out how my problem should be handled.
I have a database with roughly 35 million records. Every minute, around 500 updates happen on that database, and I want to keep those changes in ElasticSearch as soon as possible.
Right now (before I met LogStash) I did the following every 10 seconds:
On update insert a row into a trigger table
A process reading from that trigger table and update the values in ElasticSearch.
But I think using LogStash, it could go much faster and easier, although I have not found out how.
I wrote a test to insert the 35.000.000 records into elasticsearch with the JDBC Input connector. I've set the fetch_size to 1000, page_size to 1000 and enabled paging. The first 200.000 rows went pretty fast, but now, 18 hours later, only 1.000.000 records have been processed. And every 1000 records take around 10 minutes to process.
And this is only for the first insert. So I need another process to keep ES up to date with the database.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.