From time to time, I’ve seen bulk index fails because of version conflicts. Because of how things are, I’m ok with some docs failing to index because it exists already. However, in these cases, do all the documents get rejected or just those with conflicts?
For example, let‘s say I am bulk-indexing 50 documents and 5 of them had version conflicts. This results to the bulk index request returning a 409 version_conflict_engine_exception. Will the 45 documents that are part of the same bulk index request go through or will all of them fail?
If all of them will fail, is there a way to add a parameter to let ES know to index the 45 documents? I basically want to index everything in the requests and ignore the failing ones.
Just the ones with conflicts. The response to the bulk request gives a doc-by-doc breakdown of what happened and will indicate which docs were successful and which ones weren't.
That’s good to know. In this case I guess I won’t have to do anything then. In our app, it’s possible for the same document (with the same content and doc ID) being indexed from multiple sources, and if I understand it correctly, this means that a document already exists in the cluster with the same doc id, thus causing a version conflict.
Is it possible to pass a param to let ES know to not return an error even on version conflict? I know that it is possible to do that for other endpoints but I don’t see that being an option for bulk index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.