We are using ES to index ~1.5mil records from the database. To populate the
index we are using Pentaho ES component which is set to “Overwrite if
exists” (runs ~15 min). Also, individual indexed documents can be
retrieved, updated or deleted via Java services.
The question is, what will ES return during full Pentaho update run. For
example, we have 1.5mil indexed documents with version = 1. Next update
will change this version to 2. If we request a document while Pentaho is
updating it – will we receive the old version of it? Will service will be
unavailable for that particular document? Also, if we receive an old
version, will the new version be available immediately after update or will
it wait till full batch is updated (pentaho component is sending rows in
batches of 5k)?
On Wednesday, March 6, 2013 8:53:29 PM UTC+1, Yuliya wrote:
Hello all!
Please help
We are using ES to index ~1.5mil records from the database. To populate
the index we are using Pentaho ES component which is set to “Overwrite if
exists” (runs ~15 min). Also, individual indexed documents can be
retrieved, updated or deleted via Java services.
The question is, what will ES return during full Pentaho update run. For
example, we have 1.5mil indexed documents with version = 1. Next update
will change this version to 2. If we request a document while Pentaho is
updating it – will we receive the old version of it? Will service will be
unavailable for that particular document? Also, if we receive an old
version, will the new version be available immediately after update or will
it wait till full batch is updated (pentaho component is sending rows in
batches of 5k)?
if you query while updating you will see the updated results in "near"
realtime so there is a delay between update and being searchable but by
default we refresh every second so results are pretty "up-to-date". 5k docs
should be indexed quiet quickly too though. If you are updateing you will
always find your documents yet you might still see an old version depending
on how close to the update you search for it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.