A brief search across the history of the discuss forum I'm currently typing this answer in revealed the following threads which might be helpful for you:
Hi,
Thank you for your replay,
my definition for a duplicate is the same document with the same unique field under the same index.
This situation is caused by problems in inserting the data. So now I need to find an efficient way to find all the duplicates and delete them (leaving only one copy). It would be best if you could help us create a query that will find those duplicates and delete them.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.