Below are couple of areas where I need more clarification. Any help is appreciated.
- We are indexing data from multiple threads in parallel and different threads could try to index the data with same id. The reason is because we get data against different contexts that are managed outside the system. The content of the context do overlap and come in streams
Question: When I index the data with same id in parallel on a cluster, will be there be any exceptions? It should not be as it is a re-index.
Version is not passed explicitly from outside
The content is indexed and never updated on a specific attribute by parallel thread, though it may index the content with different values to an attribute. But the later case is very rare.
- _id that represents a content uniquely may contain below format of data. Sometimes it is guid, sometimes it is from a different algorithm
Question: Any restrictions in _id field (or) any performance related configuration to be specified for _id field at mapping level