Best practice on getting data out of RDBMS(PostgreSQL)?

Hi, i've looked into setting up jdbc river from postgres but running into
issues where only onetime snapshots are working. Just curious how is
everyone else pushing data to ES and handling changes?
I was going to dig more into jdbc river and see if i can get deltas
working, currently there are few bugs/issues with it. Another thought was
to use index aliases, building new index and altering alias by
adding/removing new/old index but i havent figured out fully automation
process around that. Any feedback is much appreciated.

-Alexey

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For the JDBC river, I started to implement only a demonstration of how
data can be read from tabular data model in RDBMS and moved into the
JSON doc model, without providing the configuration of all the data
domains that are possible. If you can provide a single SQL statement for
fetching the data, you get it.

Automating the creation of deltas on relational data depends on the
domain of the relational model, which is difficult. Not always a match
with the JSON document model is guarantueed, or with the ES
index/type/docid style of addressing.

So the first challenge to move updates from tabular data to ES is
identifying the updates and assign ES doc IDs to it, or other ES
features, like parent/child.

Also, because ES overwriting of docs with same IDs is simple, you have
to take care of your style of bulk index versions. Versions may be
accomplished by ES versions (like I tried) or by different ES indexes,
maybe daily versions, and automatic clean up of old versions. By
following this strategy, it is straightforward to move all of the data
you have.

It does not make much sense to fiddle with single fields and update them
in existing ES documents, because, if you have many updates, the
overhead becomes significant soon and does not scale well in search
engines with inverted indexes like ES. Switching to a fresh index is
faster and ES supports this cool feature.

Jörg

Am 06.06.2013 21:40, schrieb Alexey Volochenko:

Hi, i've looked into setting up jdbc river from postgres but running
into issues where only onetime snapshots are working. Just curious how
is everyone else pushing data to ES and handling changes?
I was going to dig more into jdbc river and see if i can get deltas
working, currently there are few bugs/issues with it. Another thought
was to use index aliases, building new index and altering alias by
adding/removing new/old index but i havent figured out fully
automation process around that. Any feedback is much appreciated.

-Alexey

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.