Hi,
I am using single master, 2 data nodes for my cluster. (6shards, 1Replica)
I also have routing specified while indexing documents.
When i have indexed about 20Gb of documents, i can see multiple documents with same _ID
Can this happen ? As i assume that ID are unique, and even if we create many document with same ID but different content it should overwrite it and increment the _version.
When indexing documents specifying a custom _routing, the uniqueness of the _id is not guaranteed across all of the shards in the index. In fact, documents with the same _id might end up on different shards if indexed with different _routing values.
It is up to the user to ensure that IDs are unique across the index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.