I have left the update rate to the default 15 in FSCRAWLER.
name: "index"
fs:
update_rate: "15m"
Would like to know if the new scanning cycle after 15 min is for all documents or only for modified documents after the previous scan.
I have left the update rate to the default 15 in FSCRAWLER.
name: "index"
fs:
update_rate: "15m"
Would like to know if the new scanning cycle after 15 min is for all documents or only for modified documents after the previous scan.
FSCrawler pauses for 15 minutes. Then it scans again the whole directory and sub dirs and compares dates of files with the last run date. If the file has been modified or created, the file is created or updated in Elasticsearch.
So if the file has not been updated FSCrawler skips the document right? Ideally if only a few documents were added, the second cycle of scanning would be shorter, is it?
Yes.
Great! Thank you
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.