Hi Jörg,
Thanks a lot for replying! I'll ask some more inline
On Thu, Feb 14, 2013 at 5:06 PM, Jörg Prante joergprante@gmail.com wrote:
Hi Radu,
a little bit sparsely documented, there is an "acknowledge statement"
mechanism implemented since 2.0.0 in the JDBC river.
The idea is that two SQL operations are executed per cycle:
- one that selects rows from the RDBMS and constructs JSON objects
- a second "acknowledge statement" that delete or update those rows
This "handshake" method must use a "job ID" column in the RDBMS data
(which is nothing but a version number in most cases)
The two statements are executed on two JDBC connections, a read and a
write connection, to ease the transaction management on the RDBMS side.
There are almost no limits for the "job ID" column (but, some DBs, not
Oracle, may not easily move large result sets due to some JDBC driver
dependent oddities). The only task is to set up a suitable time interval
with the "poll" parameter in the river und a mechanism on DB side to
generate the "job IDs" in sequence and to create the data to be pushed.
I'm not sure I followed. So I guess I'll have to make a new column for
every table named jobID or something like that, which will be updated with
every operation. That seems clear.
Then, how do I create the data that needs to be pushed? Do the joins and
put that data in a different table, so the JDBC river can pull it using the
"table" strategy?
If yes, then does the JDBC river automatically remove processed records
from that table? Or should I do that manually?
If none of the above should be done, I guess that table will pretty much
duplicate the data from the DB. Or am I missing something?
The JDBC river relies on the database connectivity given by JDBC. You are
right about MonogDB log-based approach, this is most attractive with the ES
river philosophy. With Oracle Streams, it should be possible to use JMS to
connect to Oracle in a stream-like fashion. But, there is no JMS river yet:
https://github.com/**elasticsearch/elasticsearch/**issues/1109https://github.com/elasticsearch/elasticsearch/issues/1109
Thanks for the pointer. At this point I see two "paths":
- build ES documents from the DB everytime, and try to filter by
version/timestamp so I won't have to build all documents. Then re-index
them
- keep track of all changes for every table. Then for every change use the
Update API to apply each change to the corresponding ES document(s). In my
use-case I can easily track the ES ID where each row should go, so that
might work. The downside of this is having to apply updates one by one (no
bulk)
Is there another option?
Best regards,
Radu
Best regards,
Jörg
Am 14.02.13 14:58, schrieb Radu Gheorghe:
Hello,
I have quite a lot of data in Oracle and I need to keep all its data
indexed in ES. I'm having trouble finding a good solution, so I hope you
guys can give me some feedback.
My use-case goes like this:
- there are lots of rows (hundreds of million) in quite a lot of tables
(10-20)
- the resulting documents are rather large, because they contain data
from all those tables. What's more, they're structured and there are some
arrays. Like, the document is the user, and the user can belong to multiple
groups
- so far the best fit seems to be Jörg's JDBC river[0], but I'm a bit
worried about managing updates. The "simple" strategy seems to need to look
at all data from Oracle, and I would like to avoid that. And I'm a bit
unclear about the "table" strategy - it seems like I would need to manage
the timestamp and versioning from Oracle?
Ideally, what I'd want to have is something like the MongoDB river does
with the operations log. If that's not possible, what would you recommend
for this use-case?
[0] https://github.com/jprante/**elasticsearch-river-jdbchttps://github.com/jprante/elasticsearch-river-jdbc
Thanks and best regards,
Radu
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.
--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.