Strategy for keeping Elasticsearch updated with MySQL


(arthurx) #1

Hello! I use MySQL as my primary datastore and use Elasticsearch to further
index the documents.
My problem is keeping the data in ES in sync with MySQL.

Currently I have two methods in mind:

  1. whenever add or update an entry in MySQL, do the action together in ES.
  2. Do some cron jobs that periodically keep ES in sync with the data in
    MySQL.

For method 2 I wonder how can I check if an entry is already indexed in
Elasticsearch. And would it be efficient at all if I have to check every
entry to see if it is updated?

I am new to the technology and I am afraid I had missed some really obvious
and established solutions here. Or otherwise the "normal" way this
situation is handled?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55d842e5-277f-4d24-b5a9-8be5b5544dbc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

I would do 1/ to have a more near real time search.
Also, I'd the idea that I have an object in memory and I simply push it to MySQL and to ES in the same time. No need to read again the object from MySQL to index it in another process (proposition 2)

That said you could use also a Message Queue in the middle if you want to be able at some point to stop your ES cluster without stopping your application.
This is what I did in the past.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 20:13:40, arthurX (fc28222@gmail.com) a écrit:

Hello! I use MySQL as my primary datastore and use Elasticsearch to further index the documents.
My problem is keeping the data in ES in sync with MySQL.

Currently I have two methods in mind:

  1. whenever add or update an entry in MySQL, do the action together in ES.
  2. Do some cron jobs that periodically keep ES in sync with the data in MySQL.

For method 2 I wonder how can I check if an entry is already indexed in Elasticsearch. And would it be efficient at all if I have to check every entry to see if it is updated?

I am new to the technology and I am afraid I had missed some really obvious and established solutions here. Or otherwise the "normal" way this situation is handled?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55d842e5-277f-4d24-b5a9-8be5b5544dbc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52cdb688.70a64e2a.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Николай Колев) #3

Hi Arthur,

I have done something similar years ago when I was working for a newspaper.
We kept articles in database and full text was done with external program.
There was a trigger the tables with articles that on every change operation
adds record in a queue table. Something like this:
article_id, opetation_type, table_name
Then there was a cron jon every minute that reads from this table and

  • On delete deletes the entry
  • On Update deletes the entry and generates new simple page with the new
    artice – only title and content and put it on indexer to be indexed
  • On insert generates new simple page with the new artice – only title
    and content and put it on indexer to be indexed

Articles have been placed in some directory like this:
/<root_dir>/<table_name>//.html. Then this path was returned
and easy parsed to generate appropriate link to the artice.

After success removes respective record from the queue and we have near
realtime

This can be done with ES but much easier

best reragards,
Nickolay Kolev

08 януари 2014, сряда, 21:13:35 UTC+2, arthurX написа:

Hello! I use MySQL as my primary datastore and use Elasticsearch to
further index the documents.
My problem is keeping the data in ES in sync with MySQL.

Currently I have two methods in mind:

  1. whenever add or update an entry in MySQL, do the action together in ES.
  2. Do some cron jobs that periodically keep ES in sync with the data in
    MySQL.

For method 2 I wonder how can I check if an entry is already indexed in
Elasticsearch. And would it be efficient at all if I have to check every
entry to see if it is updated?

I am new to the technology and I am afraid I had missed some really
obvious and established solutions here. Or otherwise the "normal" way this
situation is handled?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7dff496c-cc26-4620-bf1b-115a53f0ca6d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Norberto Meijome) #4

+1 having a queue and consumers between your source of truth and ES is a
great approach. You cab decouple and independently scale ( and stop when
needed as DP said) the different components, minimising impact to your
users.
On 09/01/2014 7:35 AM, "David Pilato" david@pilato.fr wrote:

I would do 1/ to have a more near real time search.
Also, I'd the idea that I have an object in memory and I simply push it to
MySQL and to ES in the same time. No need to read again the object from
MySQL to index it in another process (proposition 2)

That said you could use also a Message Queue in the middle if you want to
be able at some point to stop your ES cluster without stopping your
application.
This is what I did in the past.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 20:13:40, arthurX (fc28222@gmail.com//fc28222@gmail.com)
a écrit:

Hello! I use MySQL as my primary datastore and use Elasticsearch to
further index the documents.
My problem is keeping the data in ES in sync with MySQL.

Currently I have two methods in mind:

  1. whenever add or update an entry in MySQL, do the action together in ES.
  2. Do some cron jobs that periodically keep ES in sync with the data in
    MySQL.

For method 2 I wonder how can I check if an entry is already indexed in
Elasticsearch. And would it be efficient at all if I have to check every
entry to see if it is updated?

I am new to the technology and I am afraid I had missed some really
obvious and established solutions here. Or otherwise the "normal" way this
situation is handled?

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/55d842e5-277f-4d24-b5a9-8be5b5544dbc%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.52cdb688.70a64e2a.1449b%40MacBook-Air-de-David.local
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4LRefoG1u0MMtX96UhoJG72mXHk9U9G2w4Gv4XJtB9aLg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #5

In case of MySQL, you can write a listener to the MySQL binlog to
index/delete rows, by assuming your data model provides an 1:1 relationship
between SQL rows and ES docs.

If that is not possible because your model is based on table joins, and you
can live with a single node (non-scalable) solution, you may look at the
JDBC river https://github.com/jprante/elasticsearch-river-jdbc which was
built for demonstration purposes.

For checking if a doc is already indexed, you should use update operation
with the versioning feature of ES. Updates are like indexing behind the
scenes with a previous read operation. It is efficient, because you save an
extra round trip between server and client.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHaEtaYaL2eNN%3DgHT%2BS7jrKbHnvOAMy5pzHrR25tgKb0A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6