FSCrawler Ingestion Issue

Umesh_S_Bhimani · September 7, 2020, 5:56am

I have installed FSCrawler using a user account on a server and am trying to use a service account later to create a job and ingest the data. Does it require to use the same account used for FSCrawler installation to ingest the data?

Also, the data I am trying to ingest is around 8-9TB, the maximum ingestion of 3TB worked for me so far.

dadoonet · September 23, 2020, 1:17am

The account which needs to run FSCrawler needs to have a read access to the files you want to scan and a write access to the FSCrawler configuration dir.

Did the job progress? Anything in logs?

Umesh_S_Bhimani · September 23, 2020, 3:25am

Yes, the data ingestion is in progress. But I broke it down in to smaller chunks of 2-3 TB each to ingest into one main index. It took around 3-4 days ingest the 2-3 TB of data.

I have created 5 separate fscrawler services & the services are running smoothly ever since no issues identified yet.

Does the fscrawler pick the lastmodifieddate from the settings.json as the start point of ingestion?

dadoonet · September 23, 2020, 8:17am

FSCrawler uses a _status.json file to store the last timestamp that was used when it started to run.

This date is then compared to the files which are crawled again to see if there have been any change.

I hope this clarifies.

Umesh_S_Bhimani · September 23, 2020, 11:17am

Yes this helps, I stopped my fscrawler service 2 days back but it still has not created the _status.json file. Does it take time to create the file?

system · October 21, 2020, 11:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingesting documents (pdf, word, .txt) to elasticsearch Elasticsearch	31	39096	March 21, 2017
Fscrawler not indexing all files Elasticsearch	4	1161	July 24, 2020
FSCrawler is not indexing consistently Elasticsearch	7	1358	April 15, 2019
FSCrawler Question Elasticsearch	7	3141	March 17, 2017
Can we index incremental data for files using FSCrawler? Elasticsearch	12	887	August 30, 2019

FSCrawler Ingestion Issue

Related topics