What's the best way to sync data between MySQL and Elasticsearch

Shafath_Ahmed · February 3, 2020, 5:17am

Hi All
We are using MySQL as primary db and Elasticsearch for secondary db(basically for search purpose). We are building a Job Portal. There are mainly three use cases where the search will be performed on Elasticsearch.

Job Search
CV bank Search
Applicants Search/Match for a job post

For these use cases, we need to store data of Job Post (to perform Job Search), Applicants Current CV (to perform CV bank Search), Applicants CV when applied to a job post (to perform Applicants Search /Match for a job post). We have find out probable three ways to sync/store the data in MySQL and Elasticsearch.

Use a Queue server(RabbitMQ or Kafka): When something is stored in MySQL, we will send message to Queue server which will retrieve and transform data and store into Elasticsearch.
Use Logstash: Periodically, the logstash will be responsible to search MySQL DB / Table update and store data into Elasticsearch.
Real time sync/store from application layer: We can store data in Elasticsearch just after we are storing data into MySQL from application layer/code level.

I know, all of these have pros and cons in terms of performance, real time sync/store, maintaining another server etc. But, I am not sure in which way to go? Can you please guide? Thanks in advance.

dadoonet · February 3, 2020, 5:45am

Welcome!

I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/

Basically, I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.

In my past experience, I was doing your option 1.

Shafath_Ahmed · February 3, 2020, 5:57am

Thanks David (@dadoonet). I went through your blog. Excellent write up.

Can you share some experience on any problems you were facing using option 1 (Queue)?

Also, I think Logstash is not the use case on this one if thinking about real time scenario. Am I right?

And, using Application layer, should I be careful about anything else? e.g in production environment or complexity level etc?

dadoonet · February 3, 2020, 6:15am

1 and 3 are the same IMO because you are sending from the application layer right?

You can use Logstash but in another way.

I'd personally:

from the application layer, send to Kafka or Redis or any other message Q system
read with Logstash from Kafka and send to Elasticsearch

Logstash supports also failures, DLQ which is useful IMO.

system · March 2, 2020, 6:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can I sync data from mysql db to Elastic App Search? Elastic Search elastic-app-search	3	433	July 19, 2022
Realtime Sync MySQL Database to elasticsearch Elasticsearch	3	4006	August 14, 2019
How do I connect my MySQL database to Elasticsearch (Cloud) Elasticsearch	3	638	July 7, 2020
Syncing Huge DataSet From MySQL to Elasticsearch Elasticsearch	1	315	December 21, 2023
Data base Synchronize with elastic search in real time Elasticsearch	6	4057	March 19, 2018

What's the best way to sync data between MySQL and Elasticsearch

Related topics