How can I reach real-time index updating using ES + Logstash + MySQL when data was updated, added or deleted?

(Ed) #1

Hi there, I have MySQL DB with one table without joins with ~100 millions of rows. These items can be updated or removed and also new records are saved into MySQL DB e.g. every minutes. How can I reach updating indexes in Elasticsearch when data is updated/changed/deleted in MySQL DB? Actualizing data is must have for me it's very important to stay up to date with MySQL. I'm a new in ES. Thanks. PS. I'm ready to look on other DB solutions and even NoSQL solutions if it can help me to solve my problem. The issue is to get data very fast with aggregations.

(Aaron Mildenstein) #2

In order to achieve this, you will need a separate process that monitors binlog transactions and initiates updates to the Elasticsearch end of things. Elastic does not currently offer any tools that do this, so you'll have to find one that someone open sourced, or write your own.

(Ed) #3

Thank you very much! I have another idea to achieve this: e.g. java code that handle data updating and sends requests to MySQL and to Elastic (reindex manually using e.g. bulk API) what do you think of it? Is it fast operating to reindex manually?

(Aaron Mildenstein) #4

It would never be as fast as the aforementioned binlog approach. I have no idea what you would consider acceptable latency for a match between MySQL tables and an Elasticsearch index. Your initial post suggested you wanted this in real time. If that is not the case (that you need real-time updates published to Elasticsearch), then an approach like the one you mention, or even just sending upserts from your code (upsert = update if present, insert if absent), and/or deletes when necessary, may be sufficient for you.

(Ed) #5

Okay, I see. Thanks for the advice I'll try to implement the way you suggested.

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.