Thanks for your interest in JDBC river.
The JDBC river uses external versioning to solve a hard challenge, the
task of moving tabular data (SQL) to hierarchical documents (JSON). By
doing this by an SQL select statament with labeled column names (dot
notation) arbitrary JSON can be build out of rows. It is assumed a
single result row generates a single JSON doc.
In a river, there are three types of op types: create, index, delete.
So I have to deal with updates to existing JSON docs. Well, I overwrite
this docs (which is not optimal, it neglects partial updates).
Then, adding new docs. When new docs are added, there are two
possibilities of interpretation: a new whole index, or incremental adding.
Deleting data is most sensitive. Unfortunately, SQL deletes do not
propagate to the index from the RDBMS. To detect row deletions, I use
generations of SQL selects. Each river cycle is a generation. The
generation is the external version in ES. To ensure only one generation
exists, I have implemented a "housekeeper" that deletes all(!) docs that
have a lower version.
I'm aware this is not ideal for every situation. For example, it is
incompatible to incremental updates, which is expected intuitively by
many users.
So I prepared a very modular architecture (river source, river flow,
river mouth) so if you need another philosophy it should be quite easy
to add your customized strategy to the code by reusing much of the
existing code. The ES community is so smart and strong, I think there
must be more JDBC to ES strategies.
I have not thought about manipulating docs manually in an existing JDBC
river index. There is a generation number in the river but it is
unstable (it will be reset at a river restart). The "housekeeper" is
very naive and can be confused by manual doc insertions.
Jörg
Am 10.07.13 09:57, schrieb Ümit Seren:
I have a couple of jdbc rivers that run every hour (using the simple
strategy). They work fine.
Now I have the requirement that new records should be "instantly"
visible in the search and also in some lists that get populated by ES.
In oder to do this I manually add the new records to ES as soon as
they are persisted in the DB.
However after the next river run those documents get deleted by the
housekeeping functionality of the jdbc river because they have a lower
version number than the the version number of the jdbc river.
How does the jdbc river deal with new document ? Is the version of
newly added rows automatically set to the one that is stored in the
jdbc river index ?
If so when manually adding documents to the ES index should I use
version_type = external and set the version to the one of the jdbc river ?
cheers
Ümit
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.