[JDBC River] - Unexpected behaviour with versioning

Hi,

I started to use the JDBC River plugin, but I'm not getting the expected
results. Let me explain.

I'm trying to achieve a very basic use case, where I have a simple 3
columns table in a MySQL DB, and I want the JDBC River to index them, an be
aware of any changes periodically using a poll.

Based on the docs, I can create such a JDBC River using:

Once created, I do get the OK confirmation JSON from ES, so in the logs I
can see the River is working. The very first time it runs, I can get all
the results properly (I'm only using a few records for the sake of
simplicity):

However, if I go to the DB, modify a record, and then get back to ES, I
don't see that modification being detected whatsoever in the indexed data.

After some research, I found that maybe the value changed detection is not
yet available in the plugin, so I tried out again by modifying the id
itself in the DB without success either.

Therefore, what am I doing wrong here? As far as I can read in the plugin
docs, this should be a pretty straight-forward scenario, but I just cannot
make it work.

Any help would be appreciated.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

the change detection is there, if you add a JDBC river, look at this
document

curl localhost:9200/_river/my_jdbc_river/_custom/
{"_index":"_river","_type":"my_jdbc_river","_id":"_custom","_version":1,"exists":true,
"_source" :
{"jdbc":{"created":"2013-02-25T23:51:25.147Z","version":1,"digest":"zLHqWpENr/kn3MPk4Xww00J2zuNaOwdVnoNaIuHW7qI="}}}

In the "digest" field, you see a base64 value that is computed against
all the rows that have been fetched in the last round. Next round, a new
digest is computed and compared against the previous digest to detect a
change.

Note, default strategy is "oneshot", which only run the river once to
make the river usage very easy for demonstration.

To activate the rounds, you must change the "strategy" parameter to
"simple", and set the "poll" interval (default is "1h").

It would help if you can copy your statements here so I can understand
better what you want to achieve.

Best regards,

Jörg

Am 25.02.13 11:47, schrieb Enrique Medina Montenegro:

After some research, I found that maybe the value changed detection is
not yet available in the plugin, so I tried out again by modifying the
id itself in the DB without success either.

Therefore, what am I doing wrong here? As far as I can read in the
plugin docs, this should be a pretty straight-forward scenario, but I
just cannot make it work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

I have the same issue and was wondering how this was resolved.

I have this users table with id, fname, lname and update_date.

I am using this statement

"SELECT *, id AS _id, CONCAT(fname, ' ', lname) AS member_name FROM users
WHERE (create_date BETWEEN DATE_SUB(NOW(), INTERVAL 1 HOUR) AND NOW()) OR
(update_date BETWEEN DATE_SUB(NOW(), INTERVAL 1 HOUR) AND NOW())"

and using strategy => simple, poll => 5s

when I update a row in the db, say the fname. fname before was "George"
then I update it to "Will", it's not updated. Am I missing something here?
I have already set an _id field.

On Tuesday, February 26, 2013 8:03:10 AM UTC+8, Jörg Prante wrote:

Hi,

the change detection is there, if you add a JDBC river, look at this
document

curl localhost:9200/_river/my_jdbc_river/_custom/
{"_index":"_river","_type":"my_jdbc_river","_id":"_custom","_version":1,"exists":true,

"_source" :
{"jdbc":{"created":"2013-02-25T23:51:25.147Z","version":1,"digest":"zLHqWpENr/kn3MPk4Xww00J2zuNaOwdVnoNaIuHW7qI="}}}

In the "digest" field, you see a base64 value that is computed against
all the rows that have been fetched in the last round. Next round, a new
digest is computed and compared against the previous digest to detect a
change.

Note, default strategy is "oneshot", which only run the river once to
make the river usage very easy for demonstration.

To activate the rounds, you must change the "strategy" parameter to
"simple", and set the "poll" interval (default is "1h").

It would help if you can copy your statements here so I can understand
better what you want to achieve.

Best regards,

J?rg

Am 25.02.13 11:47, schrieb Enrique Medina Montenegro:

After some research, I found that maybe the value changed detection is
not yet available in the plugin, so I tried out again by modifying the
id itself in the DB without success either.

Therefore, what am I doing wrong here? As far as I can read in the
plugin docs, this should be a pretty straight-forward scenario, but I
just cannot make it work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

so it seems that digest is not changing even if there are changes made in
the database. i'm using v2.0.3

is this a bug?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for reporting, I will investigate.

Jörg

Am 20.03.13 05:02, schrieb lyndon.camson@upraxis.com:

so it seems that digest is not changing even if there are changes made
in the database. i'm using v2.0.3

is this a bug?

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks. One question, on your wiki

"If a document ID is no longer provided by the database between river runs,

Elasticsearch will not index. As a result, the version of the document in
the index gets lower than the actual version managed by the river run, and
a housekeeping procedure is required to remove such obsolete documents."

My sql statement will return only subsets of my whole table. if the other
rows are not updated, then does that mean that your plugin will delete
these id's?

Thanks again.

On Wednesday, March 20, 2013 7:02:52 PM UTC+8, Jörg Prante wrote:

Thanks for reporting, I will investigate.

J?rg

Am 20.03.13 05:02, schrieb lyndon...@upraxis.com <javascript:>:

so it seems that digest is not changing even if there are changes made
in the database. i'm using v2.0.3

is this a bug?

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I truely admit the concept of the JDBC river updates with versions is
not very mature and lacks many features. It assumes a simple full set
approach - i.e. move the data into ES, stamp the documents with
versions, move next data into ES, stamp again, and then remove the
documents with older versions. If the SQL statements of the second rund
skip data from the first run by intention, there is no chance to detect
version changes correctly.

Jörg

Am 21.03.13 06:07, schrieb lyndon.camson@upraxis.com:

My sql statement will return only subsets of my whole table. if the
other rows are not updated, then does that mean that your plugin will
delete these id's?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I understand. I am new to both java and elasticsearch. Perhaps I would need
a custom solution for what I am trying to achieve, which is to index
updated/created data from a time interval. It would be a nice additional
feature for the jdbc-river plugin to do "upserts" for sql queries that are
fetching subsets and will just hook on the _id field as querying a whole
set on mysql would be very much slow.

Hope you will consider.

Thanks again.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.