FSCrawler Ingestion Issue

I have installed FSCrawler using a user account on a server and am trying to use a service account later to create a job and ingest the data. Does it require to use the same account used for FSCrawler installation to ingest the data?

Also, the data I am trying to ingest is around 8-9TB, the maximum ingestion of 3TB worked for me so far.

The account which needs to run FSCrawler needs to have a read access to the files you want to scan and a write access to the FSCrawler configuration dir.

Did the job progress? Anything in logs?

Yes, the data ingestion is in progress. But I broke it down in to smaller chunks of 2-3 TB each to ingest into one main index. It took around 3-4 days ingest the 2-3 TB of data.

I have created 5 separate fscrawler services & the services are running smoothly ever since no issues identified yet.

Does the fscrawler pick the lastmodifieddate from the settings.json as the start point of ingestion?

FSCrawler uses a _status.json file to store the last timestamp that was used when it started to run.

This date is then compared to the files which are crawled again to see if there have been any change.

I hope this clarifies.

Yes this helps, I stopped my fscrawler service 2 days back but it still has not created the _status.json file. Does it take time to create the file?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.