_id field restrictions + parallel indexing conflicts

Karthik_Ramachandran · April 21, 2016, 9:25pm

All,

Below are couple of areas where I need more clarification. Any help is appreciated.

We are indexing data from multiple threads in parallel and different threads could try to index the data with same id. The reason is because we get data against different contexts that are managed outside the system. The content of the context do overlap and come in streams

Question: When I index the data with same id in parallel on a cluster, will be there be any exceptions? It should not be as it is a re-index.
NOTE:
Version is not passed explicitly from outside
The content is indexed and never updated on a specific attribute by parallel thread, though it may index the content with different values to an attribute. But the later case is very rare.

_id that represents a content uniquely may contain below format of data. Sometimes it is guid, sometimes it is from a different algorithm
000015DCD378AEB62D6577008F74CE0D0D00000000000000
00002170CC0B770937CEDE89174373F03500000000000000
4420c7d30-56b765c7107-384142
46abefc10-46abefc117-72157665504465145
736b8362d3e8c953d3107219b76a7059
Question: Any restrictions in _id field (or) any performance related configuration to be specified for _id field at mapping level

Thanks.

javanna · April 22, 2016, 4:38pm

The index api overrides a document that already exists with same index, type and id. That means that if you are sending a document with same document, type and id, the last one that gets in wins, and will replace all of its previous versions, unless you use some optimistic locking. Otherwise you could switch to index if absent behaviour so that only the first document would be indexed and all the following ones would be rejected. I guess what you do depends very much on your usecase. Why do those documents have same id? What is the right thing to do in that case?

Karthik_Ramachandran · April 22, 2016, 7:09pm

Thanks Luca. Yes and agree that Index overrides it and the last one wins/overrides as applicable. But, just want to check if it will cause any issues. Regarding you as on why these document do have same ID - it has because same document could come against multiple context, and the pulls against these contexts do happen in parallel.

BTW, any thoughts/inputs on my second query on the value to ID..

javanna · April 22, 2016, 8:24pm

Sorry I had missed your second question. I am not aware of any restriction around the value for the _id field.

Cheers
Luca

Topic		Replies	Views
Multiple documents with same _id Elasticsearch	5	11752	December 19, 2017
Unique Constraint? Elasticsearch	3	2483	July 6, 2017
What happens when multiple users index into same index and type with same document ids? Elasticsearch	3	573	November 14, 2017
Exist any limitations on how a _id document field could be Elasticsearch	4	483	March 7, 2022
Concurrent document create with _version Elasticsearch	3	701	July 5, 2017

_id field restrictions + parallel indexing conflicts

Related topics