Pentaho-ElasticSearch

Hello all!

Please help :slight_smile:

We are using ES to index ~1.5mil records from the database. To populate the
index we are using Pentaho ES component which is set to “Overwrite if
exists” (runs ~15 min). Also, individual indexed documents can be
retrieved, updated or deleted via Java services.
The question is, what will ES return during full Pentaho update run. For
example, we have 1.5mil indexed documents with version = 1. Next update
will change this version to 2. If we request a document while Pentaho is
updating it – will we receive the old version of it? Will service will be
unavailable for that particular document? Also, if we receive an old
version, will the new version be available immediately after update or will
it wait till full batch is updated (pentaho component is sending rows in
batches of 5k)?

Pentaho - 4.4
ElasticSearch - 0.19.4
Lucene - 3.6.0

Thank you!
Yuliya

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hey there,

On Wednesday, March 6, 2013 8:53:29 PM UTC+1, Yuliya wrote:

Hello all!

Please help :slight_smile:

We are using ES to index ~1.5mil records from the database. To populate
the index we are using Pentaho ES component which is set to “Overwrite if
exists” (runs ~15 min). Also, individual indexed documents can be
retrieved, updated or deleted via Java services.
The question is, what will ES return during full Pentaho update run. For
example, we have 1.5mil indexed documents with version = 1. Next update
will change this version to 2. If we request a document while Pentaho is
updating it – will we receive the old version of it? Will service will be
unavailable for that particular document? Also, if we receive an old
version, will the new version be available immediately after update or will
it wait till full batch is updated (pentaho component is sending rows in
batches of 5k)?

if you query while updating you will see the updated results in "near"
realtime so there is a delay between update and being searchable but by
default we refresh every second so results are pretty "up-to-date". 5k docs
should be indexed quiet quickly too though. If you are updateing you will
always find your documents yet you might still see an old version depending
on how close to the update you search for it.

hope that answers you question

simon

Pentaho - 4.4
ElasticSearch - 0.19.4
Lucene - 3.6.0

Thank you!
Yuliya

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you, Simon, for the fast response!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.