Hi,
I have some code that works as follows:
- Query for a document using a non-analyzed field (so I should only
get a result on an exact match)
- If the query returned something, I update a counter in the document
and re-index it (using prepareIndex with the id value from the hit)
- If the query did not return anything, then I create a new document
(using prepareIndex without an id)
My expectation is that this code should not produce any documents that
are duplicates (that have the same value in the non-analyzed field
used in the original search).
However, that's not the case. I often find several different versions
of each document.
How is this possible and what can I do to prevent it?
Are you indexing while searching, or do you have multiple ClientS
working at the same time that might cause an inconsistent state of
your index?
-Luke
On Feb 21, 6:45 pm, Frank LaRosa fr...@studyblue.com wrote:
Hi,
I have some code that works as follows:
- Query for a document using a non-analyzed field (so I should only
get a result on an exact match)
- If the query returned something, I update a counter in the document
and re-index it (using prepareIndex with the id value from the hit)
- If the query did not return anything, then I create a new document
(using prepareIndex without an id)
My expectation is that this code should not produce any documents that
are duplicates (that have the same value in the non-analyzed field
used in the original search).
However, that's not the case. I often find several different versions
of each document.
How is this possible and what can I do to prevent it?
On Tue, 2012-02-21 at 16:45 -0800, Frank LaRosa wrote:
Hi,
I have some code that works as follows:
- Query for a document using a non-analyzed field (so I should only
get a result on an exact match)
- If the query returned something, I update a counter in the document
and re-index it (using prepareIndex with the id value from the hit)
- If the query did not return anything, then I create a new document
(using prepareIndex without an id)
Because all of this is happening in parallel, you may well get two
processes checking for the same (missing) value at the same time, and
both of them end up creating new docs.
The only way to emulate a unique key in ES is by using the doc ID.
clint