Maintain a unique field while indexing - equivalent to a UNIQUE INDEX in a relational database


(Tomer Praizler) #1

Hey,

Is there a way to use one of my document fields as a uniq id? making sure there will not be 2 document with the same value in that field?

Here are 2 documents for example:

   {
       "name": "name1",
       "email":"firstEmail@gmail.com"
   },
   {
       "name": "name2",
       "email":"seconddEmail@gmail.com"
   }

I want to "define" the email field as a uniq identifier.
To get closer to this goal, I search elasticsearch before indexing for a document with the same email, and when I can't find that email I index the document, and refresh the index.
This is of course not bullet proof, but it cover 99.999% of the requests

I want to make it more accurate , and make sure I will never index 2 documents with the same email.
Is there a better way to do it?

Thanks!


(Nik Everett) #2

Making the document's id the email would do it.


(Tomer Praizler) #3

Can you please explain why?
If I send 100 index requests in one second, with the same email address as their id, it will work?
This is not an atomic operation as I understand.


(Nik Everett) #4

Elasticsearch tries not to let you make two documents with the same type:id. Updating a single document is atomic on each copy of the shard on which it lives. Each copy may apply the update at a different time though. And they become visible to search (refresh) at different times as well. Elasticsearch does some tricks where it reads the document out of the index if its been refreshed or out of the translog if it hasn't, but they amount to optimistic concurrency control and are exposed through a version parameter on index commands.

You can cheat the document uniqueness using routing or parent/child. That is known and documented. Otherwise if you get convince it to let you make two documents with the same type:id then its a bug.

Its certainly possible that there are bugs related to network partitions where Elasticsearch can get confused. Those are bugs that are actively being worked though.


(Tomer Praizler) #5

Cool!

I understand your suggestion, but is there a way to do it, without setting the id to be the email?
I want to keep the auto generated id, and use a separate field for email.

Is my suggestion(original question) is valid? is there a better way to do it?


(system) #6