We run a regular job that does a 'select ... where last_updated_at >=
$last_sync" That job updates elasticsearch with any modified records.
It also deletes records from elasticsearch where the postgres record is
marked as no longer valid (eg 'logically deleted').
We also run a separate job, currently once a day, that builds a
completely new index, then updates it with any recent changes since it
started (it takes a few hours), then switches aliases to point to the
new index. That catches any 'physical deletions'.
That combination of jobs has worked out pretty well for us so far.
We're also looking at using a semi-replication strategy. The idea is
that we'd setup a the master side of logical replication system
(for example Bucardo or http://slony.info).
Then write our own slave side that would read the event stream and
make the corresponding changes to elasticsearch.
That would give us realtime updates.
I'm also hoping that someone will implement a Foreign Data Wrapper for
Elasticsearch - Foreign data wrappers - PostgreSQL wiki
Tim.
On Wed, Oct 15, 2014 at 09:35:10AM +0200, David Pilato wrote:
My thoughts on this.
If you can modify the application which writes to your RDBMS, then you should do it from there.
If not, it depends on your data I'd say.
For example, if your application delete data instead of marking them as deleted, you will basically have
to reindex everything often.
If your database contains timestamp for your data, then you can use that to only index/remove changes.
If your entities are not flat, it means that you have to run multiple SQL calls to generate a full JSON
object. In that case, you should write your own code as a batch or so.
Le 15 oct. 2014 Ã 09:16, Jorge von Rudno <[1]jorge.vonrudno.gp@googlemail.com> a écrit :
Dear College,
I have used "river" in order to load an index in elasticsearch from postgresql and now I want to keep
the index synchronized with the database. Can I use river for this purpose?, how? or there are another
tool to do this?
At the beginning I have considered to write a trigger and a stored procedure, but If "river" is
designed to do this I will prefer use it.
kind regards.
Jorge von Rudno
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
[2]elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
[3]https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com.
For more options, visit [4]https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
[5]elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
[6]https://groups.google.com/d/msgid/elasticsearch/7F4E864F-11C4-41E1-86D4-BEA6F8D1AC36%40pilato.fr.
For more options, visit [7]https://groups.google.com/d/optout.
References
Visible links
- mailto:jorge.vonrudno.gp@googlemail.com
- mailto:elasticsearch+unsubscribe@googlegroups.com
- https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com?utm_medium=email&utm_source=footer
- https://groups.google.com/d/optout
- mailto:elasticsearch+unsubscribe@googlegroups.com
- https://groups.google.com/d/msgid/elasticsearch/7F4E864F-11C4-41E1-86D4-BEA6F8D1AC36%40pilato.fr?utm_medium=email&utm_source=footer
- https://groups.google.com/d/optout
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20141016200854.GA1615%40timac.local.
For more options, visit https://groups.google.com/d/optout.