_id field restrictions + parallel indexing conflicts

All,

Below are couple of areas where I need more clarification. Any help is appreciated.

  1. We are indexing data from multiple threads in parallel and different threads could try to index the data with same id. The reason is because we get data against different contexts that are managed outside the system. The content of the context do overlap and come in streams

Question: When I index the data with same id in parallel on a cluster, will be there be any exceptions? It should not be as it is a re-index.
NOTE:
Version is not passed explicitly from outside
The content is indexed and never updated on a specific attribute by parallel thread, though it may index the content with different values to an attribute. But the later case is very rare.

  1. _id that represents a content uniquely may contain below format of data. Sometimes it is guid, sometimes it is from a different algorithm
    000015DCD378AEB62D6577008F74CE0D0D00000000000000
    00002170CC0B770937CEDE89174373F03500000000000000
    4420c7d30-56b765c7107-384142
    46abefc10-46abefc117-72157665504465145
    736b8362d3e8c953d3107219b76a7059
    Question: Any restrictions in _id field (or) any performance related configuration to be specified for _id field at mapping level

Thanks.

The index api overrides a document that already exists with same index, type and id. That means that if you are sending a document with same document, type and id, the last one that gets in wins, and will replace all of its previous versions, unless you use some optimistic locking. Otherwise you could switch to index if absent behaviour so that only the first document would be indexed and all the following ones would be rejected. I guess what you do depends very much on your usecase. Why do those documents have same id? What is the right thing to do in that case?

Thanks Luca. Yes and agree that Index overrides it and the last one wins/overrides as applicable. But, just want to check if it will cause any issues. Regarding you as on why these document do have same ID - it has because same document could come against multiple context, and the pulls against these contexts do happen in parallel.

BTW, any thoughts/inputs on my second query on the value to ID..

Sorry I had missed your second question. I am not aware of any restriction around the value for the _id field.

Cheers
Luca