Postgres->elasticsearch realtime update

Dear College,

I have used "river" in order to load an index in elasticsearch from
postgresql and now I want to keep the index synchronized with the database.
Can I use river for this purpose?, how? or there are another tool to do
this?

At the beginning I have considered to write a trigger and a stored
procedure, but If "river" is designed to do this I will prefer use it.

kind regards.

Jorge von Rudno

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

My thoughts on this.

If you can modify the application which writes to your RDBMS, then you should do it from there.
If not, it depends on your data I'd say.

For example, if your application delete data instead of marking them as deleted, you will basically have to reindex everything often.
If your database contains timestamp for your data, then you can use that to only index/remove changes.

If your entities are not flat, it means that you have to run multiple SQL calls to generate a full JSON object. In that case, you should write your own code as a batch or so.

Le 15 oct. 2014 à 09:16, Jorge von Rudno jorge.vonrudno.gp@googlemail.com a écrit :

Dear College,

I have used "river" in order to load an index in elasticsearch from postgresql and now I want to keep the index synchronized with the database. Can I use river for this purpose?, how? or there are another tool to do this?

At the beginning I have considered to write a trigger and a stored procedure, but If "river" is designed to do this I will prefer use it.

kind regards.

Jorge von Rudno

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7F4E864F-11C4-41E1-86D4-BEA6F8D1AC36%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

If you have an SQL command that can update your data and you can live with
a delay in the minute range, you will be fine with JDBC river.

If you want to track all deletes/updates/insert no matter if there is an
SQL command, or if you want instant updates (aka "push", e.g. with
trigger), this is not possible with JDBC river.

Jörg

On Wed, Oct 15, 2014 at 9:16 AM, Jorge von Rudno <
jorge.vonrudno.gp@googlemail.com> wrote:

Dear College,

I have used "river" in order to load an index in elasticsearch from
postgresql and now I want to keep the index synchronized with the database.
Can I use river for this purpose?, how? or there are another tool to do
this?

At the beginning I have considered to write a trigger and a stored
procedure, but If "river" is designed to do this I will prefer use it.

kind regards.

Jorge von Rudno

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGR%3DdxpFhBO8%2B303u%2B6XD3odhXJkbEK%3DVkdoshDo_2E%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

We run a regular job that does a 'select ... where last_updated_at >=
$last_sync" That job updates elasticsearch with any modified records.
It also deletes records from elasticsearch where the postgres record is
marked as no longer valid (eg 'logically deleted').

We also run a separate job, currently once a day, that builds a
completely new index, then updates it with any recent changes since it
started (it takes a few hours), then switches aliases to point to the
new index. That catches any 'physical deletions'.

That combination of jobs has worked out pretty well for us so far.

We're also looking at using a semi-replication strategy. The idea is
that we'd setup a the master side of logical replication system
(for example http://bucardo.org/wiki/Bucardo or http://slony.info).
Then write our own slave side that would read the event stream and
make the corresponding changes to elasticsearch.

That would give us realtime updates.

I'm also hoping that someone will implement a Foreign Data Wrapper for
Elasticsearch - https://wiki.postgresql.org/wiki/Foreign_data_wrappers

Tim.

On Wed, Oct 15, 2014 at 09:35:10AM +0200, David Pilato wrote:

My thoughts on this.
If you can modify the application which writes to your RDBMS, then you should do it from there.
If not, it depends on your data I'd say.
For example, if your application delete data instead of marking them as deleted, you will basically have
to reindex everything often.
If your database contains timestamp for your data, then you can use that to only index/remove changes.
If your entities are not flat, it means that you have to run multiple SQL calls to generate a full JSON
object. In that case, you should write your own code as a batch or so.
Le 15 oct. 2014 Ã 09:16, Jorge von Rudno <[1]jorge.vonrudno.gp@googlemail.com> a écrit :

 Dear College,

 I have used "river" in order to load an index in elasticsearch from postgresql and now I want to keep
 the index synchronized with the database. Can I use river for this purpose?, how? or there are another
 tool to do this?

 At the beginning I have considered to write a trigger and a stored procedure, but If "river" is
 designed to do this I will prefer use it.

 kind regards.

 Jorge von Rudno

 --
 You received this message because you are subscribed to the Google Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send an email to
 [2]elasticsearch+unsubscribe@googlegroups.com.
 To view this discussion on the web visit
 [3]https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com.
 For more options, visit [4]https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
[5]elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
[6]https://groups.google.com/d/msgid/elasticsearch/7F4E864F-11C4-41E1-86D4-BEA6F8D1AC36%40pilato.fr.
For more options, visit [7]https://groups.google.com/d/optout.

References

Visible links

  1. mailto:jorge.vonrudno.gp@googlemail.com
  2. mailto:elasticsearch+unsubscribe@googlegroups.com
  3. https://groups.google.com/d/msgid/elasticsearch/CAFqKu%3DaF86B2nqc9aXyUA77OXjZOXQihfT1x0QbuLpNWXe8HnQ%40mail.gmail.com?utm_medium=email&utm_source=footer
  4. https://groups.google.com/d/optout
  5. mailto:elasticsearch+unsubscribe@googlegroups.com
  6. https://groups.google.com/d/msgid/elasticsearch/7F4E864F-11C4-41E1-86D4-BEA6F8D1AC36%40pilato.fr?utm_medium=email&utm_source=footer
  7. https://groups.google.com/d/optout

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20141016200854.GA1615%40timac.local.
For more options, visit https://groups.google.com/d/optout.