Thank you Clint, very good advise about separate index for unique values.
I'll try to use it, but before, I'm trying to find a bug in my code, as you
suggested, and the only possible thing I suspect is following:
To make sure I'm receiving complete search result I'm checking
response.getFailedShards()==0
But I also found response.getSuccessfulShards()
I was thinking
that response.getFailedShards()+response.getSuccessfulShards()=response.getTotalShards()
but now I'm not sure.
May be instead of checking response.getFailedShards()==0 I should be
checking response.getSuccessfulShards()==response.getTotalShards().
Could someone please clarify?
Thank you,
Eugene S.
On Thursday, July 19, 2012 4:26:59 AM UTC-4, Clinton Gormley wrote:
Hi Eugene
One field of my document is not analyzed string, except it is lower
cased. I need to keep this field unique, but it is not any ID or
anything like this, just a field.
So, when I need to index a new document I check if any document with
such value in this field exists by just running term search with the
new value.
It works just fine everywhere. But in production (which is very busy
some times), I've found documents with the same value. I've checked
time when it was indexed, there are days and hours between those
documents. I've added check if all nodes was searched and no errors
during that Term Search, and if so, I don't index a document. But
still I see from time to time that this term search doesn't return a
document with the value, and the system inserts a new document with
same value.
Note: search is near-real time. By default, search's "view" on the
indexed data is refreshed once every second. So it is quite possible to
have a document which has been indexed, (and which you can GET) but is
not visible to search.
I don't know what refresh interval you have set, but it seems unlikely
that these docs were indexed hours or days before. A term search IS
reliable (although it is possible that you have some other bug in your
code which is interfering with that).
Either way, your approach for managing a unique field is incorrect. You
will always be subject to race conditions.
An approach you can use is similar to what I used in
http://blogs.perl.org/users/clinton_gormley/2011/10/elasticsearchsequence---a-blazing-fast-ticket-server.html
The only unique field in ES is the _id.
So you can have an index whose job it is to maintain a list of unique
values, stored in the _id field.
Eg, let's say you want to make sure that, for field 'my_val', the value
'foo' is not used elsewhere. You can have an index 'unique', with a type
'my_val'.
Try to create a document as:
- index: unique
- type: myval
- id: foo
if it fails, then the value already exists
(Of course, if the doc where you use that value is later deleted, then
you need to delete the unique doc as well)
clint