I'm using a couple of workers to update existing documents with one or more
attachment/files using a worker pattern. 99% of the time this works but
once in a while I'm getting the following exception:
The way update works is by reading the document from the shard, and then
indexing it (using the version it was read with). There might be a version
conflict happening if two updates on the same doc happen at the same time,
and then interleave. You can set the retryOnConflict to a higher value to
automatically retry the update if it happens.
I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:
That makes sense if I have two or more processes updating the same doc. Is
there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.
Thanks,
Shane
On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:
The way update works is by reading the document from the shard, and then
indexing it (using the version it was read with). There might be a version
conflict happening if two updates on the same doc happen at the same time,
and then interleave. You can set the retryOnConflict to a higher value to
automatically retry the update if it happens.
I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:
What do you mean by handling it more gracefully? What do you have in mind?
The main idea here is not to block the indexing process while the update is
happening. You can have a high retry on conflict value...
That makes sense if I have two or more processes updating the same doc. Is
there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.
Thanks,
Shane
On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:
The way update works is by reading the document from the shard, and then
indexing it (using the version it was read with). There might be a version
conflict happening if two updates on the same doc happen at the same time,
and then interleave. You can set the retryOnConflict to a higher value to
automatically retry the update if it happens.
I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:
For background, I have documents that are currently indexed in two phases.
First phase is fairly quick and indexes all basic data.
Second phase is potentially longer running since it's using the
attachment-mapper plugin to update the document with possibly several
attachments ranging in size from a few KB to several MB's.
The idea here to get all documents indexed as quickly as possible with
basic data then have several workers index the related attachments. While I
understand your design decision to not block the indexing process, I think
this should be mentioned in the documentation somewhere. I'll try your
suggestion of increasing the retryOnConflict but it seems better in my case
to manually synchronize the second phase to minimize indexing time and
resources. Would you agree this is the better approach?
Thanks,
Shane
On Tuesday, March 27, 2012 10:30:51 AM UTC-4, kimchy wrote:
What do you mean by handling it more gracefully? What do you have in mind?
The main idea here is not to block the indexing process while the update is
happening. You can have a high retry on conflict value...
That makes sense if I have two or more processes updating the same doc.
Is there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.
Thanks,
Shane
On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:
The way update works is by reading the document from the shard, and then
indexing it (using the version it was read with). There might be a version
conflict happening if two updates on the same doc happen at the same time,
and then interleave. You can set the retryOnConflict to a higher value to
automatically retry the update if it happens.
I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:
Not really sure I understand..., are you indexing a doc, and then adding
attachments to it using update API? Why not index the whole doc with all
the attachments at once?
For background, I have documents that are currently indexed in two phases.
First phase is fairly quick and indexes all basic data.
Second phase is potentially longer running since it's using the
attachment-mapper plugin to update the document with possibly several
attachments ranging in size from a few KB to several MB's.
The idea here to get all documents indexed as quickly as possible with
basic data then have several workers index the related attachments. While I
understand your design decision to not block the indexing process, I think
this should be mentioned in the documentation somewhere. I'll try your
suggestion of increasing the retryOnConflict but it seems better in my case
to manually synchronize the second phase to minimize indexing time and
resources. Would you agree this is the better approach?
Thanks,
Shane
On Tuesday, March 27, 2012 10:30:51 AM UTC-4, kimchy wrote:
What do you mean by handling it more gracefully? What do you have in
mind? The main idea here is not to block the indexing process while the
update is happening. You can have a high retry on conflict value...
That makes sense if I have two or more processes updating the same doc.
Is there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.
Thanks,
Shane
On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:
The way update works is by reading the document from the shard, and
then indexing it (using the version it was read with). There might be a
version conflict happening if two updates on the same doc happen at the
same time, and then interleave. You can set the retryOnConflict to a higher
value to automatically retry the update if it happens.
I'm using a couple of workers to update existing documents with one or
more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:
Correct. I'm indexing a doc, then adding attachments via update API. I'm
doing a two-phase approach to get the basic information indexed more
quickly instead of waiting for all associated attachments to get indexed up
front. This way I'm able to index over 1M documents with all data other
than attachments in ~20 mins instead of waiting for ~200K attachments to
get indexed which takes much longer.
Anyway, I increased the retryOnConflict to 20 and the
VersionConflictEngineException went away.
Thanks,
Shane
On Tuesday, March 27, 2012 2:50:36 PM UTC-4, kimchy wrote:
Not really sure I understand..., are you indexing a doc, and then adding
attachments to it using update API? Why not index the whole doc with all
the attachments at once?
For background, I have documents that are currently indexed in two
phases.
First phase is fairly quick and indexes all basic data.
Second phase is potentially longer running since it's using the
attachment-mapper plugin to update the document with possibly several
attachments ranging in size from a few KB to several MB's.
The idea here to get all documents indexed as quickly as possible with
basic data then have several workers index the related attachments. While I
understand your design decision to not block the indexing process, I think
this should be mentioned in the documentation somewhere. I'll try your
suggestion of increasing the retryOnConflict but it seems better in my case
to manually synchronize the second phase to minimize indexing time and
resources. Would you agree this is the better approach?
Thanks,
Shane
On Tuesday, March 27, 2012 10:30:51 AM UTC-4, kimchy wrote:
What do you mean by handling it more gracefully? What do you have in
mind? The main idea here is not to block the indexing process while the
update is happening. You can have a high retry on conflict value...
That makes sense if I have two or more processes updating the same doc.
Is there a way to handle this more gracefully other than increasing the
retryOnConflict? Otherwise, I'll have to synchronize my processes to only
do one update at a time for a document.
Thanks,
Shane
On Tuesday, March 27, 2012 9:22:33 AM UTC-4, kimchy wrote:
The way update works is by reading the document from the shard, and
then indexing it (using the version it was read with). There might be a
version conflict happening if two updates on the same doc happen at the
same time, and then interleave. You can set the retryOnConflict to a higher
value to automatically retry the update if it happens.
I'm using a couple of workers to update existing documents with one
or more attachment/files using a worker pattern. 99% of the time this works
but once in a while I'm getting the following exception:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.