What's the best way to sync data between MySQL and Elasticsearch

Hi All
We are using MySQL as primary db and Elasticsearch for secondary db(basically for search purpose). We are building a Job Portal. There are mainly three use cases where the search will be performed on Elasticsearch.

  1. Job Search
  2. CV bank Search
  3. Applicants Search/Match for a job post

For these use cases, we need to store data of Job Post (to perform Job Search), Applicants Current CV (to perform CV bank Search), Applicants CV when applied to a job post (to perform Applicants Search /Match for a job post). We have find out probable three ways to sync/store the data in MySQL and Elasticsearch.

  1. Use a Queue server(RabbitMQ or Kafka): When something is stored in MySQL, we will send message to Queue server which will retrieve and transform data and store into Elasticsearch.

  2. Use Logstash: Periodically, the logstash will be responsible to search MySQL DB / Table update and store data into Elasticsearch.

  3. Real time sync/store from application layer: We can store data in Elasticsearch just after we are storing data into MySQL from application layer/code level.

I know, all of these have pros and cons in terms of performance, real time sync/store, maintaining another server etc. But, I am not sure in which way to go? Can you please guide? Thanks in advance.


I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/

Basically, I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.

In my past experience, I was doing your option 1.

Thanks David (@dadoonet). I went through your blog. Excellent write up.

Can you share some experience on any problems you were facing using option 1 (Queue)?

Also, I think Logstash is not the use case on this one if thinking about real time scenario. Am I right?

And, using Application layer, should I be careful about anything else? e.g in production environment or complexity level etc?

1 and 3 are the same IMO because you are sending from the application layer right?

You can use Logstash but in another way.

I'd personally:

  • from the application layer, send to Kafka or Redis or any other message Q system
  • read with Logstash from Kafka and send to Elasticsearch

Logstash supports also failures, DLQ which is useful IMO.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.