We are using a data generator to send records to a three node Elastic Search cluster. We are using the Java API to perform each 1000 bulk insert operation. We had previously used 2.1.1 and are now using 2.2.0. We are currently undertaking some failure testing whereby we kill a master or a standard node to see the impact (if any) and log these. We note down any operation failures along with those that have been successful and confirm the total number of written records using kopf.
We noticed previously in 2.1.1 that when we killed a master there were some missing records when we used a bulk insert. These always seemed to be less than 1000 and we presumed that this was because only part of the bulk operation being written and therefore successful confirmation could not be given. This resulted in some missing records in our counts at the end. With 2.2.0, we found that we had extra records which were recorded. This was because although the bulk operation returned a failure, a number of those records had actually been written.
you mention that you're using the Java API and I assume you bulk index via theBulkProcessor API. If that's the case, we changed one thing between 2.1.x and 2.2, namely, we have introduced an automatic backoff in case we get an EsRejectedExecutionException. To be honest, I'd be a bit surprised if this would explain the behavior you're describing but you can try add the following line when creating the BulkProcessor (provided you even use it):
This will restore the previous behavior of BulkProcessor.
Anyway, is there any chance you can show (a potentially stripped-down version of) the source code of your test client? Then it might be easier to spot / reproduce the problem you're describing.
Do you use random doc IDs with more than one concurrent bulk requests (async) and with default quorum consistency?
In that case, duplicate documents can occur, because the new 2.2 back-off feature assumes that all bulk items with EsRejectedExecutionException can be safely repeated, assuming they never reached a shard.
If you want to avoid duplicate documents you should either use non-random doc IDs or you should disable the back-off feature. I don't prefer the new feature, because it hides errors and adds complexity to the client.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.