How many shards is this data distributed across? The min_doc_count is applied once the shards have returned results, so if the duplicates reside in different shards and are few they could be missed as the relevant data may not be returned from all shards. This is described in the docs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.