Syncing Elasticsearch with MySQL database

Hi all :slight_smile:

I am new to Elasticsearch and currently trying to plan the best setup and avoid driving into a corner.
I have read most of the documentation, and this very informative blogpost:

We have a database of user profiles and what we want is the ability to search based on stuff like gender, height and geo location. The geo location was the demand that triggered me to go look for something like Elasticsearch.

I plan to update the location of active users to MySQL every 5 minutes. Apart from that, users can of cause edit there profile whenever they feel like it.

If I understand the blogpost right, I should register the unique id of a profile every time data, which is also in Elasticsearch, is changed. Lets say I have a separat table for that called es_edits.

I then write a script to put in a cronjob. In this script I take out 1000 user ids from es_edits, fetch the data for these, and bulk send this update to ES. Upon success, the 1000 ids is deleted from the es_edits table.

The script continues to pull 1000 ids until the es_edits table is empty.

Does this sound reasonable so far?

Another concern is ... how often can/should I run this cronjob? Especially with the geo locations in mind. I would like this information to be up-to-date in ES as soon as possible. But I also don't want to slow ES down with too many updates. What are the pitfalls?

I should mention that we plan to setup ES as a cluster on 3 webservers.

EDIT: I have consulted an ES expert who advised me not to use bulk updates in this case. I was under the impression that one should always use bulk updates, but ES is quite good at handling "on-the-fly" changes to documents. Therefore I have chosen to update whenever a device reports a new location.

I've currently replicating data from a MySql database to Elasticsearch, which is working as described below:

  1. A Java app that reads MySql binary log files and send the data to Kafka. It uses to read the logs.
  2. A Java Kafka consumer which consumes Kafka topics, applies some transformation to data and send the results to Elasticsearch.

It is a little different from what you are trying to do, but it is a different possibility to take into account.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.