I have installed FSCrawler using a user account on a server and am trying to use a service account later to create a job and ingest the data. Does it require to use the same account used for FSCrawler installation to ingest the data?
Also, the data I am trying to ingest is around 8-9TB, the maximum ingestion of 3TB worked for me so far.
The account which needs to run FSCrawler needs to have a read access to the files you want to scan and a write access to the FSCrawler configuration dir.
Yes, the data ingestion is in progress. But I broke it down in to smaller chunks of 2-3 TB each to ingest into one main index. It took around 3-4 days ingest the 2-3 TB of data.
I have created 5 separate fscrawler services & the services are running smoothly ever since no issues identified yet.
Does the fscrawler pick the lastmodifieddate from the settings.json as the start point of ingestion?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.