How to update/delete indexed documents from ES index using mysql jdbc river


(dark_shadow) #1

Hi,

I'm using jdbc mysql river plugin
https://github.com/jprante/elasticsearch-river-jdbc for creating ES index.
I have been able to index my documents successfully but I'm facing issues
in updating/deleting indexed documents. My jdbc river is used with a sql
query that uses multiple joins on tables and return results. These results
are then indexed in ES. My problem is if I update some tables they will
affect the results of that join query which should be reflected in ES index
but ES index is not updating/deleting sql results from that join query. I
found few threads where people are facing similar issue.

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more
thread from author itself where he states that deletions are no longer
supported. He tells two methods of tackling the issue. One is re indexing
itself and second is using some sql queries to update/delete indexed
documents.

Can anyone please tell me how can I tackle the issue. How can I
update/delete already indexed documents. Can anyone please elaborate on the
second method of updating/deleting indexed documents.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

Can you give a minimal example of a query with the rows, and what rows are
deleted then, so we can work through the issue?

The fundamental problem is that deleted rows in SQL are no longer available
for creating deletion requests and so they can not be tracked over time -
once they are gone, they are gone. The problem is known as "stale data".
This can be solved either at a bigger scope (by using time windowed indexes
where older indexes can be dropped) or by an extra DB mechanism to provide
the IDs of the deleted docs after they are deleted (maybe by trigger), so
they can be selected by JDBC plugin with a "select _optype, _id"
construction. Note, at a certain size, deleting single docs in ES is not
efficient.

To sync data between DB and ES, JDBC plugin is probably not smart enough
(it is impossible to implement app-specific logic in JDBC plugin). So you
should also consider to write a middleware app with specific logic that
controls the deletions in the DB and after that deletes docs in ES.

Jörg

On Thu, Aug 7, 2014 at 10:44 AM, coder mukulnitkkr@gmail.com wrote:

Hi,

I'm using jdbc mysql river plugin
https://github.com/jprante/elasticsearch-river-jdbc for creating ES
index. I have been able to index my documents successfully but I'm facing
issues in updating/deleting indexed documents. My jdbc river is used with a
sql query that uses multiple joins on tables and return results. These
results are then indexed in ES. My problem is if I update some tables they
will affect the results of that join query which should be reflected in ES
index but ES index is not updating/deleting sql results from that join
query. I found few threads where people are facing similar issue.

http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more
thread from author itself where he states that deletions are no longer
supported. He tells two methods of tackling the issue. One is re indexing
itself and second is using some sql queries to update/delete indexed
documents.
https://github.com/jprante/elasticsearch-river-jdbc/issues/202

Can anyone please tell me how can I tackle the issue. How can I
update/delete already indexed documents. Can anyone please elaborate on the
second method of updating/deleting indexed documents.

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHPW5DaPGoewQJ6Qp0bBWubuFfaDUZ1x1eqOLrnZo9LrQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(dark_shadow) #3

Here goes my use case:

Table t1 --- >
id a b
123 somestring somestring

Table t2 --->
id c d e
123 someIntegerCount somebooleanValue someString
select * from t1,t2 where t1.id=t2.id and t2.c > 0 and t2.d = 1;

which gives some rows as:

id a b c
d e
123 someString someString someIntegerCount somebooleanValue
someString
Now, In my use case the values of c and d fields in table t2 keeps changing
frequently. So, I index only those rows for which c field > 0 (as count
keeps changing) and d field = 1 (which means either enabled or disabled).

Now, First time indexing is done without any issues. Problem comes when I
update these two fields and want ES to reindex the documents. Since there
might be some documents for which earlier c field was 0 but now it is
non-zero and similarly d field was 0 earlier but changed to 1. Now, I want
ES ti reflect those changes.(I guess that what mongo river does and I
expect mysql to work, automatic sync). Also, there will be few results
which were coming earlier but not now. How to delete those docs from index ?

How can I accomplish this ?

I have tried to explain my problem in simplest manner by keeping things
simple. Please ask questions if anything is not clear.

Thanks

On Thursday, 7 August 2014 14:25:07 UTC+5:30, Jörg Prante wrote:

Can you give a minimal example of a query with the rows, and what rows are
deleted then, so we can work through the issue?

The fundamental problem is that deleted rows in SQL are no longer
available for creating deletion requests and so they can not be tracked
over time - once they are gone, they are gone. The problem is known as
"stale data". This can be solved either at a bigger scope (by using time
windowed indexes where older indexes can be dropped) or by an extra DB
mechanism to provide the IDs of the deleted docs after they are deleted
(maybe by trigger), so they can be selected by JDBC plugin with a "select
_optype, _id" construction. Note, at a certain size, deleting single docs
in ES is not efficient.

To sync data between DB and ES, JDBC plugin is probably not smart enough
(it is impossible to implement app-specific logic in JDBC plugin). So you
should also consider to write a middleware app with specific logic that
controls the deletions in the DB and after that deletes docs in ES.

Jörg

On Thu, Aug 7, 2014 at 10:44 AM, coder <mukul...@gmail.com <javascript:>>
wrote:

Hi,

I'm using jdbc mysql river plugin
https://github.com/jprante/elasticsearch-river-jdbc for creating ES
index. I have been able to index my documents successfully but I'm facing
issues in updating/deleting indexed documents. My jdbc river is used with a
sql query that uses multiple joins on tables and return results. These
results are then indexed in ES. My problem is if I update some tables they
will affect the results of that join query which should be reflected in ES
index but ES index is not updating/deleting sql results from that join
query. I found few threads where people are facing similar issue.

http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more
thread from author itself where he states that deletions are no longer
supported. He tells two methods of tackling the issue. One is re indexing
itself and second is using some sql queries to update/delete indexed
documents.
https://github.com/jprante/elasticsearch-river-jdbc/issues/202

Can anyone please tell me how can I tackle the issue. How can I
update/delete already indexed documents. Can anyone please elaborate on the
second method of updating/deleting indexed documents.

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42039e84-86e3-4dff-baac-662174a5c1dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #4

Only a quick thought, it seems you have a large table in RDBMS, and there
are selectors ('where' condition) that control the dissemanition, what rows
of the table have to be selected for further processing.

A first thought is to pull the whole database into ES and build filtered
queries for the 'where' condition.

If the database table is steadily growing, it could be more feasible to
fetch them by timestamp, e.g. once a day or per hour.

Jörg

On Thu, Aug 7, 2014 at 6:19 PM, coder mukulnitkkr@gmail.com wrote:

Here goes my use case:

Table t1 --- >
id a b
123 somestring somestring

Table t2 --->
id c d e
123 someIntegerCount somebooleanValue someString
select * from t1,t2 where t1.id=t2.id and t2.c > 0 and t2.d = 1;

which gives some rows as:

id a b c
d e
123 someString someString someIntegerCount somebooleanValue
someString
Now, In my use case the values of c and d fields in table t2 keeps
changing frequently. So, I index only those rows for which c field > 0 (as
count keeps changing) and d field = 1 (which means either enabled or
disabled).

Now, First time indexing is done without any issues. Problem comes when I
update these two fields and want ES to reindex the documents. Since there
might be some documents for which earlier c field was 0 but now it is
non-zero and similarly d field was 0 earlier but changed to 1. Now, I want
ES ti reflect those changes.(I guess that what mongo river does and I
expect mysql to work, automatic sync). Also, there will be few results
which were coming earlier but not now. How to delete those docs from index ?

How can I accomplish this ?

I have tried to explain my problem in simplest manner by keeping things
simple. Please ask questions if anything is not clear.

Thanks

On Thursday, 7 August 2014 14:25:07 UTC+5:30, Jörg Prante wrote:

Can you give a minimal example of a query with the rows, and what rows
are deleted then, so we can work through the issue?

The fundamental problem is that deleted rows in SQL are no longer
available for creating deletion requests and so they can not be tracked
over time - once they are gone, they are gone. The problem is known as
"stale data". This can be solved either at a bigger scope (by using time
windowed indexes where older indexes can be dropped) or by an extra DB
mechanism to provide the IDs of the deleted docs after they are deleted
(maybe by trigger), so they can be selected by JDBC plugin with a "select
_optype, _id" construction. Note, at a certain size, deleting single docs
in ES is not efficient.

To sync data between DB and ES, JDBC plugin is probably not smart enough
(it is impossible to implement app-specific logic in JDBC plugin). So you
should also consider to write a middleware app with specific logic that
controls the deletions in the DB and after that deletes docs in ES.

Jörg

On Thu, Aug 7, 2014 at 10:44 AM, coder mukul...@gmail.com wrote:

Hi,

I'm using jdbc mysql river plugin https://github.com/jprante/
elasticsearch-river-jdbc for creating ES index. I have been able to
index my documents successfully but I'm facing issues in updating/deleting
indexed documents. My jdbc river is used with a sql query that uses
multiple joins on tables and return results. These results are then indexed
in ES. My problem is if I update some tables they will affect the results
of that join query which should be reflected in ES index but ES index is
not updating/deleting sql results from that join query. I found few threads
where people are facing similar issue.

http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-
mysql-not-deleting-records

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more
thread from author itself where he states that deletions are no longer
supported. He tells two methods of tackling the issue. One is re indexing
itself and second is using some sql queries to update/delete indexed
documents.
https://github.com/jprante/elasticsearch-river-jdbc/issues/202

Can anyone please tell me how can I tackle the issue. How can I
update/delete already indexed documents. Can anyone please elaborate on the
second method of updating/deleting indexed documents.

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%
40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/42039e84-86e3-4dff-baac-662174a5c1dc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/42039e84-86e3-4dff-baac-662174a5c1dc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGF0%2BzKfmtxfM_yaVpG8%2BnhHXmUztNQOuOOJehUt9K6-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5