How to keep ElasticSearch in sync with a database using LogStash


#1

Hi,

I'm relatively new to Logstash, and even though I've browsed the web for a few days now, I can not seem to find out how my problem should be handled.

I have a database with roughly 35 million records. Every minute, around 500 updates happen on that database, and I want to keep those changes in ElasticSearch as soon as possible.

Right now (before I met LogStash) I did the following every 10 seconds:

  1. On update insert a row into a trigger table
  2. A process reading from that trigger table and update the values in ElasticSearch.

But I think using LogStash, it could go much faster and easier, although I have not found out how.

Could you guys/girls help me out?
Thanks!


(Andrew Cholakian) #2

Have you seen the Logstash JDBC input? https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html


#3

Yes i have already seen that input, but I have some concerns.

  • I can not trigger it more than once every minute.
  • How do I allow it to only fetch the updated rows instead of the whole database? I mean, 35 million records is not a small number.

For filling ES, I agree with the JDBC input, but for keeping up-to-date, I doubt it is the best way.


#4

I wrote a test to insert the 35.000.000 records into elasticsearch with the JDBC Input connector. I've set the fetch_size to 1000, page_size to 1000 and enabled paging. The first 200.000 rows went pretty fast, but now, 18 hours later, only 1.000.000 records have been processed. And every 1000 records take around 10 minutes to process.

And this is only for the first insert. So I need another process to keep ES up to date with the database.

Does anybody know how to solve this problem?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.