The processors generally work within the context of a simgle document, so do not have access to other documents already in the index. If you are looking to avoid duplicates you can do this by assigning a predictable ID that will cause an update when the duplicate arrives. This is described in this old blog post.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.