Maintain a unique field while indexing - equivalent to a UNIQUE INDEX in a relational database

tpraizler · December 23, 2015, 7:08pm

Hey,

Is there a way to use one of my document fields as a uniq id? making sure there will not be 2 document with the same value in that field?

Here are 2 documents for example:

   {
       "name": "name1",
       "email":"firstEmail@gmail.com"
   },
   {
       "name": "name2",
       "email":"seconddEmail@gmail.com"
   }

I want to "define" the email field as a uniq identifier.
To get closer to this goal, I search elasticsearch before indexing for a document with the same email, and when I can't find that email I index the document, and refresh the index.
This is of course not bullet proof, but it cover 99.999% of the requests

I want to make it more accurate , and make sure I will never index 2 documents with the same email.
Is there a better way to do it?

Thanks!

nik9000 · December 23, 2015, 7:29pm

Making the document's id the email would do it.

tpraizler · December 23, 2015, 8:51pm

Can you please explain why?
If I send 100 index requests in one second, with the same email address as their id, it will work?
This is not an atomic operation as I understand.

nik9000 · December 23, 2015, 9:55pm

Elasticsearch tries not to let you make two documents with the same type:id. Updating a single document is atomic on each copy of the shard on which it lives. Each copy may apply the update at a different time though. And they become visible to search (refresh) at different times as well. Elasticsearch does some tricks where it reads the document out of the index if its been refreshed or out of the translog if it hasn't, but they amount to optimistic concurrency control and are exposed through a version parameter on index commands.

You can cheat the document uniqueness using routing or parent/child. That is known and documented. Otherwise if you get convince it to let you make two documents with the same type:id then its a bug.

Its certainly possible that there are bugs related to network partitions where Elasticsearch can get confused. Those are bugs that are actively being worked though.

tpraizler · December 23, 2015, 11:02pm

Cool!

I understand your suggestion, but is there a way to do it, without setting the id to be the email?
I want to keep the auto generated id, and use a separate field for email.

Is my suggestion(original question) is valid? is there a better way to do it?

Topic		Replies	Views
Handling unique field (Other than the ID) Elasticsearch	3	6233	September 6, 2017
Uniqueness constraint Elasticsearch	13	9098	July 6, 2017
Force elasticsearch uniqueness constraint Elasticsearch	4	421	July 6, 2017
Search result only with unique value of the specific field Elasticsearch	8	854	July 6, 2017
Alias-level unique document id? Elasticsearch	1	1055	September 19, 2017

Maintain a unique field while indexing - equivalent to a UNIQUE INDEX in a relational database

Related topics