Correct. I'm indexing a doc, then adding attachments via update API. I'm
doing a two-phase approach to get the basic information indexed more
quickly instead of waiting for all associated attachments to get indexed up
front. This way I'm able to index over 1M documents with all data other
than attachments in ~20 mins instead of waiting for ~200K attachments to
get indexed which takes much longer.
Anyway, I increased the retryOnConflict to 20 and the
VersionConflictEngineException went away.
On Tuesday, March 27, 2012 2:50:36 PM UTC-4, kimchy wrote:
Not really sure I understand..., are you indexing a doc, and then adding
attachments to it using update API? Why not index the whole doc with all
the attachments at once?
On Tue, Mar 27, 2012 at 4:45 PM, Shane Witbeck email@example.com:
For background, I have documents that are currently indexed in two
- First phase is fairly quick and indexes all basic data.
- Second phase is potentially longer running since it's using the
attachment-mapper plugin to update the document with possibly several
attachments ranging in size from a few KB to several MB's.
The idea here to get all documents indexed as quickly as possible with
basic data then have several workers index the related attachments. While I
understand your design decision to not block the indexing process, I think
this should be mentioned in the documentation somewhere. I'll try your
suggestion of increasing the retryOnConflict but it seems better in my case
to manually synchronize the second phase to minimize indexing time and
resources. Would you agree this is the better approach?
On Tuesday, March 27, 2012 10:30:51 AM UTC-4, kimchy wrote:
What do you mean by handling it more gracefully? What do you have in
mind? The main idea here is not to block the indexing process while the
update is happening. You can have a high retry on conflict value...
On Tue, Mar 27, 2012 at 3:39 PM, Shane Witbeck <firstname.lastname@example.org
That makes sense if I have two or more processes updating the same doc.
Is there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.
On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:
The way update works is by reading the document from the shard, and
then indexing it (using the version it was read with). There might be a
version conflict happening if two updates on the same doc happen at the
same time, and then interleave. You can set the retryOnConflict to a higher
value to automatically retry the update if it happens.
On Mon, Mar 26, 2012 at 10:21 PM, Shane Witbeck <
I'm using a couple of workers to update existing documents with one
or more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:
The update code is here:
Any ideas why the exception would occur?