JDBC River - Which approach


(Prasanth Nair) #1

All,

I'm working on creating a jdbc river which essentially connects to a Mysql
table (millions of rows, where updates / addition of rows will be very
common). Having said that, I would like to get suggestion on what would be
an ideal mechanism for search index and mysql to be in sync (if possible,
near real time). I tried versioning and update table approach but would
like to know are there some best practices on above requirement.

thanks for helping

prash

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

I would not recommend here to use a river as you need "real time" but I would push directly from the source (service layer) as soon as I update something in the database.

Makes sense?

What techno stack do you have?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 sept. 2013 à 05:43, Prasanth Nair pn@leapcourse.com a écrit :

All,

I'm working on creating a jdbc river which essentially connects to a Mysql table (millions of rows, where updates / addition of rows will be very common). Having said that, I would like to get suggestion on what would be an ideal mechanism for search index and mysql to be in sync (if possible, near real time). I tried versioning and update table approach but would like to know are there some best practices on above requirement.

thanks for helping

prash

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #3

JDBC river is for never changing or slow changing data, main purpose is
demo mode (moving data from RDBMS to ES).

If you want sync, you could use MySQL triggers. Simply said, if you have
few changes and don"t want to spend much effort, use sys_exec in a MySQL
trigger to push the change with curl into the ES REST API.

If you want a more sophisticated method or if you have lots of thousands of
changes, a sys_exec from a trigger obviously wouldn't scale. In such case,
I would try writing a binlog based pusher in Java using
https://code.google.com/p/open-replicator/ But, the pusher must know from
the MySQL operation stream what data to push to ES and how, which strongly
depends on the application.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Prasanth Nair) #4

Thanks David for the thoughts,

I'm using spring / jaxrs service layer and PHP frontend. I see where you are coming from and our original thought was to use an event queue (MQ) to propagate DB writes to ES, it looks like I got overexcited by River possibility :slight_smile:

--
prasanth nair
On 13 September 2013 at 10:46:13 AM, David Pilato (david@pilato.fr) wrote:
I would not recommend here to use a river as you need "real time" but I would push directly from the source (service layer) as soon as I update something in the database.

Makes sense?

What techno stack do you have?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 sept. 2013 à 05:43, Prasanth Nair pn@leapcourse.com a écrit :

All,

I'm working on creating a jdbc river which essentially connects to a Mysql table (millions of rows, where updates / addition of rows will be very common). Having said that, I would like to get suggestion on what would be an ideal mechanism for search index and mysql to be in sync (if possible, near real time). I tried versioning and update table approach but would like to know are there some best practices on above requirement.

thanks for helping

prash

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yZYBz2rIBxw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #5

I ended up pushing my index/update requests to ActiveMQ :wink:

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 sept. 2013 à 13:42, prasanth nair pn@leapcourse.com a écrit :

Thanks David for the thoughts,

I'm using spring / jaxrs service layer and PHP frontend. I see where you are coming from and our original thought was to use an event queue (MQ) to propagate DB writes to ES, it looks like I got overexcited by River possibility :slight_smile:

--
prasanth nair
On 13 September 2013 at 10:46:13 AM, David Pilato (david@pilato.fr) wrote:

I would not recommend here to use a river as you need "real time" but I would push directly from the source (service layer) as soon as I update something in the database.

Makes sense?

What techno stack do you have?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 sept. 2013 à 05:43, Prasanth Nair pn@leapcourse.com a écrit :

All,

I'm working on creating a jdbc river which essentially connects to a Mysql table (millions of rows, where updates / addition of rows will be very common). Having said that, I would like to get suggestion on what would be an ideal mechanism for search index and mysql to be in sync (if possible, near real time). I tried versioning and update table approach but would like to know are there some best practices on above requirement.

thanks for helping

prash

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yZYBz2rIBxw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Prasanth Nair) #6

Thanks Joerg, I think I got a fair idea about DOs and DONTS of river. I will stick with managing this through app layer

--
prasanth nair
On 13 September 2013 at 12:46:10 PM, joergprante@gmail.com (joergprante@gmail.com) wrote:
JDBC river is for never changing or slow changing data, main purpose is demo mode (moving data from RDBMS to ES).

If you want sync, you could use MySQL triggers. Simply said, if you have few changes and don"t want to spend much effort, use sys_exec in a MySQL trigger to push the change with curl into the ES REST API.

If you want a more sophisticated method or if you have lots of thousands of changes, a sys_exec from a trigger obviously wouldn't scale. In such case, I would try writing a binlog based pusher in Java using https://code.google.com/p/open-replicator/ But, the pusher must know from the MySQL operation stream what data to push to ES and how, which strongly depends on the application.

Jörg

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yZYBz2rIBxw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ümit Seren) #7

I did create a river with "oneshot" strategy to initially sync the data
between my PostgreSQL db and ES.
For changes afterwards I sync them using the service layer (so whenever an
entity is changed, it will be changed in the DB and in ES).

On Friday, September 13, 2013 5:43:41 AM UTC+2, Prasanth Nair wrote:

All,

I'm working on creating a jdbc river which essentially connects to a Mysql
table (millions of rows, where updates / addition of rows will be very
common). Having said that, I would like to get suggestion on what would be
an ideal mechanism for search index and mysql to be in sync (if possible,
near real time). I tried versioning and update table approach but would
like to know are there some best practices on above requirement.

thanks for helping

prash

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8