VersionConflictEngineException in 0.19.1 when updating via mapper attachments plugin

I'm using a couple of workers to update existing documents with one or more
attachment/files using a worker pattern. 99% of the time this works but
once in a while I'm getting the following exception:

The update code is here:

Any ideas why the exception would occur?

The way update works is by reading the document from the shard, and then
indexing it (using the version it was read with). There might be a version
conflict happening if two updates on the same doc happen at the same time,
and then interleave. You can set the retryOnConflict to a higher value to
automatically retry the update if it happens.

On Mon, Mar 26, 2012 at 10:21 PM, Shane Witbeck shane@digitalsanctum.comwrote:

I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:

ES 0.19.1 VersionConflictEngineException · GitHub

The update code is here:

ES update · GitHub

Any ideas why the exception would occur?

That makes sense if I have two or more processes updating the same doc. Is
there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.

Thanks,
Shane

On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:

The way update works is by reading the document from the shard, and then
indexing it (using the version it was read with). There might be a version
conflict happening if two updates on the same doc happen at the same time,
and then interleave. You can set the retryOnConflict to a higher value to
automatically retry the update if it happens.

On Mon, Mar 26, 2012 at 10:21 PM, Shane Witbeck shane@digitalsanctum.comwrote:

I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:

ES 0.19.1 VersionConflictEngineException · GitHub

The update code is here:

ES update · GitHub

Any ideas why the exception would occur?

What do you mean by handling it more gracefully? What do you have in mind?
The main idea here is not to block the indexing process while the update is
happening. You can have a high retry on conflict value...

On Tue, Mar 27, 2012 at 3:39 PM, Shane Witbeck shane@digitalsanctum.comwrote:

That makes sense if I have two or more processes updating the same doc. Is
there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.

Thanks,
Shane

On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:

The way update works is by reading the document from the shard, and then
indexing it (using the version it was read with). There might be a version
conflict happening if two updates on the same doc happen at the same time,
and then interleave. You can set the retryOnConflict to a higher value to
automatically retry the update if it happens.

On Mon, Mar 26, 2012 at 10:21 PM, Shane Witbeck <shane@digitalsanctum.com

wrote:

I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:

https://gist.github.com/**2209367 https://gist.github.com/2209367

The update code is here:

https://gist.github.com/**2209383 https://gist.github.com/2209383

Any ideas why the exception would occur?

For background, I have documents that are currently indexed in two phases.

  1. First phase is fairly quick and indexes all basic data.
  2. Second phase is potentially longer running since it's using the
    attachment-mapper plugin to update the document with possibly several
    attachments ranging in size from a few KB to several MB's.

The idea here to get all documents indexed as quickly as possible with
basic data then have several workers index the related attachments. While I
understand your design decision to not block the indexing process, I think
this should be mentioned in the documentation somewhere. I'll try your
suggestion of increasing the retryOnConflict but it seems better in my case
to manually synchronize the second phase to minimize indexing time and
resources. Would you agree this is the better approach?

Thanks,

Shane

On Tuesday, March 27, 2012 10:30:51 AM UTC-4, kimchy wrote:

What do you mean by handling it more gracefully? What do you have in mind?
The main idea here is not to block the indexing process while the update is
happening. You can have a high retry on conflict value...

On Tue, Mar 27, 2012 at 3:39 PM, Shane Witbeck shane@digitalsanctum.comwrote:

That makes sense if I have two or more processes updating the same doc.
Is there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.

Thanks,
Shane

On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:

The way update works is by reading the document from the shard, and then
indexing it (using the version it was read with). There might be a version
conflict happening if two updates on the same doc happen at the same time,
and then interleave. You can set the retryOnConflict to a higher value to
automatically retry the update if it happens.

On Mon, Mar 26, 2012 at 10:21 PM, Shane Witbeck <
shane@digitalsanctum.com> wrote:

I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:

https://gist.github.com/**2209367 https://gist.github.com/2209367

The update code is here:

https://gist.github.com/**2209383 https://gist.github.com/2209383

Any ideas why the exception would occur?

Not really sure I understand..., are you indexing a doc, and then adding
attachments to it using update API? Why not index the whole doc with all
the attachments at once?

On Tue, Mar 27, 2012 at 4:45 PM, Shane Witbeck shane@digitalsanctum.comwrote:

For background, I have documents that are currently indexed in two phases.

  1. First phase is fairly quick and indexes all basic data.
  2. Second phase is potentially longer running since it's using the
    attachment-mapper plugin to update the document with possibly several
    attachments ranging in size from a few KB to several MB's.

The idea here to get all documents indexed as quickly as possible with
basic data then have several workers index the related attachments. While I
understand your design decision to not block the indexing process, I think
this should be mentioned in the documentation somewhere. I'll try your
suggestion of increasing the retryOnConflict but it seems better in my case
to manually synchronize the second phase to minimize indexing time and
resources. Would you agree this is the better approach?

Thanks,

Shane

On Tuesday, March 27, 2012 10:30:51 AM UTC-4, kimchy wrote:

What do you mean by handling it more gracefully? What do you have in
mind? The main idea here is not to block the indexing process while the
update is happening. You can have a high retry on conflict value...

On Tue, Mar 27, 2012 at 3:39 PM, Shane Witbeck shane@digitalsanctum.comwrote:

That makes sense if I have two or more processes updating the same doc.
Is there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.

Thanks,
Shane

On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:

The way update works is by reading the document from the shard, and
then indexing it (using the version it was read with). There might be a
version conflict happening if two updates on the same doc happen at the
same time, and then interleave. You can set the retryOnConflict to a higher
value to automatically retry the update if it happens.

On Mon, Mar 26, 2012 at 10:21 PM, Shane Witbeck <
shane@digitalsanctum.com> wrote:

I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:

https://gist.github.com/**220936**7 https://gist.github.com/2209367

The update code is here:

https://gist.github.com/**220938**3 https://gist.github.com/2209383

Any ideas why the exception would occur?

Correct. I'm indexing a doc, then adding attachments via update API. I'm
doing a two-phase approach to get the basic information indexed more
quickly instead of waiting for all associated attachments to get indexed up
front. This way I'm able to index over 1M documents with all data other
than attachments in ~20 mins instead of waiting for ~200K attachments to
get indexed which takes much longer.

Anyway, I increased the retryOnConflict to 20 and the
VersionConflictEngineException went away.

Thanks,
Shane

On Tuesday, March 27, 2012 2:50:36 PM UTC-4, kimchy wrote:

Not really sure I understand..., are you indexing a doc, and then adding
attachments to it using update API? Why not index the whole doc with all
the attachments at once?

On Tue, Mar 27, 2012 at 4:45 PM, Shane Witbeck shane@digitalsanctum.comwrote:

For background, I have documents that are currently indexed in two
phases.

  1. First phase is fairly quick and indexes all basic data.
  2. Second phase is potentially longer running since it's using the
    attachment-mapper plugin to update the document with possibly several
    attachments ranging in size from a few KB to several MB's.

The idea here to get all documents indexed as quickly as possible with
basic data then have several workers index the related attachments. While I
understand your design decision to not block the indexing process, I think
this should be mentioned in the documentation somewhere. I'll try your
suggestion of increasing the retryOnConflict but it seems better in my case
to manually synchronize the second phase to minimize indexing time and
resources. Would you agree this is the better approach?

Thanks,

Shane

On Tuesday, March 27, 2012 10:30:51 AM UTC-4, kimchy wrote:

What do you mean by handling it more gracefully? What do you have in
mind? The main idea here is not to block the indexing process while the
update is happening. You can have a high retry on conflict value...

On Tue, Mar 27, 2012 at 3:39 PM, Shane Witbeck <shane@digitalsanctum.com

wrote:

That makes sense if I have two or more processes updating the same doc.
Is there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.

Thanks,
Shane

On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:

The way update works is by reading the document from the shard, and
then indexing it (using the version it was read with). There might be a
version conflict happening if two updates on the same doc happen at the
same time, and then interleave. You can set the retryOnConflict to a higher
value to automatically retry the update if it happens.

On Mon, Mar 26, 2012 at 10:21 PM, Shane Witbeck <
shane@digitalsanctum.com> wrote:

I'm using a couple of workers to update existing documents with one
or more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:

https://gist.github.com/**220936**7 https://gist.github.com/2209367

The update code is here:

https://gist.github.com/**220938**3 https://gist.github.com/2209383

Any ideas why the exception would occur?