Syncing Elasticsearch with MySQL database

Jette_Thrane · January 18, 2017, 7:42pm

Hi all

I am new to Elasticsearch and currently trying to plan the best setup and avoid driving into a corner.
I have read most of the documentation, and this very informative blogpost: https://www.elastic.co/blog/found-keeping-elasticsearch-in-sync

We have a database of user profiles and what we want is the ability to search based on stuff like gender, height and geo location. The geo location was the demand that triggered me to go look for something like Elasticsearch.

I plan to update the location of active users to MySQL every 5 minutes. Apart from that, users can of cause edit there profile whenever they feel like it.

If I understand the blogpost right, I should register the unique id of a profile every time data, which is also in Elasticsearch, is changed. Lets say I have a separat table for that called es_edits.

I then write a script to put in a cronjob. In this script I take out 1000 user ids from es_edits, fetch the data for these, and bulk send this update to ES. Upon success, the 1000 ids is deleted from the es_edits table.

The script continues to pull 1000 ids until the es_edits table is empty.

Does this sound reasonable so far?

Another concern is ... how often can/should I run this cronjob? Especially with the geo locations in mind. I would like this information to be up-to-date in ES as soon as possible. But I also don't want to slow ES down with too many updates. What are the pitfalls?

I should mention that we plan to setup ES as a cluster on 3 webservers.

EDIT: I have consulted an ES expert who advised me not to use bulk updates in this case. I was under the impression that one should always use bulk updates, but ES is quite good at handling "on-the-fly" changes to documents. Therefore I have chosen to update whenever a device reports a new location.

fabiocatalao · February 15, 2017, 4:46pm

I've currently replicating data from a MySql database to Elasticsearch, which is working as described below:

A Java app that reads MySql binary log files and send the data to Kafka. It uses https://github.com/shyiko/mysql-binlog-connector-java to read the logs.
A Java Kafka consumer which consumes Kafka topics, applies some transformation to data and send the results to Elasticsearch.

It is a little different from what you are trying to do, but it is a different possibility to take into account.

system · March 15, 2017, 4:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Strategy for keeping Elasticsearch updated with MySQL Elasticsearch	5	5373	July 6, 2017
How can I reach real-time index updating using ES + Logstash + MySQL when data was updated, added or deleted? Logstash	5	2668	October 6, 2017
Getting started with elasticsearch Elasticsearch	1	268	July 6, 2017
Sync Data from ES to MySQL Elasticsearch	7	2454	October 20, 2017
Using elasticsearch as mysql cache, is frequent index burden ES when documents is massive? Elasticsearch	6	1453	July 6, 2017

Syncing Elasticsearch with MySQL database

Related topics