Sync MySQL to Elasticsearch

Hello everyone,

In order to synchronise my MySQL database with Elasticsearch I set up the following process :

  1. I activated the binary log in MySQL in order to see the different transactions
  2. After that, I execute Logstash which read the different transactions that are stored in a file. The file that Logstash read looks like this :
INSERT INTO `mytable` (`field1`, `field2`, `field3`) VALUES (1, 'first value', 'second value')
UPDATE `mytable` SET `field1` = 'first value modified', `field2` = 'second value modified' WHERE `mytable`.`id` = 1
DELETE FROM `mytable` WHERE `mytable`.`id` = 1

I used the grok filter to check if the transaction is "INSERT", "UPDATE" or "DELETE" and after that I try to apply another filter to get the field's names and their values.

The problem is that the structure often changes, for example I could have :

UPDATE `mytable` SET `field1` = 'first value modified', `field2` = 'second value modified' WHERE `mytable`.`id` = 1
UPDATE `mytable` SET `field1` = 'first value modified'  WHERE `mytable`.`id` = 1

Here, you can notice that in the first update there is two fields that are updated and in the second line only one.

Do you know how can I solve my issue ? Do you think it's a good idea to process like this or do you have another suggestions?

Thanks in advance for your help.

Any particular reason you're not using the jdbc input plugin?

@magnusbaeck I don't use the jdbc input plugin because (if I well understand), it will reindex all my data. For example if I have 2 millions of lines in my table, I have to do : SELECT * FROM .... in order to see the DELETE operations for example.

Is there a way to tell old data from new data? Like an auto-incrementing id field (if rows aren't updated) or a timestamp that changes when a row is updated.

I'm not sure that is possible to know which data are been deleted because the "id" field is not an auto-increment but a specific id like this "d44fdkd_s5sd4".

Do you have any idea how can I sync my database with my elasticsearch?

Okay, you also have deletions. That clearly makes things more difficult. I guess I don't have any great suggestions then.

Ok ,thank you @magnusbaeck for your suggestions, I will try with these options :

  • jdbc input plugin
  • trigger in the database in order to execute command to Elasticsearch for each transaction
  • execute transaction in the database and in Elasticsearch at the same time.