Fscrawler not indexing all files

morningkaren · June 10, 2020, 8:09pm

Hello,

I am having trouble getting fscrawler to index all my files. For example, I am trying to index all 7701 files , but only 2627 got indexed after a full day. The count stopped at 2627 today.

I took some files that were not indexed from the 7701 files and put them into another folder and ran fscrawler on the other folder and fscrawler was able to index those files.

So, I'm confused as to why fscrawler was not able to index the files in the original folder.

Other things to consider:
I am using Tesseract-OCR, so maybe it is just really slow? But, I didn't think it would stop indexing at 2627 files for over 8 hours now.

Also, because I saw it stopped indexing at 2627 files, I "restarted" it. I stopped fscrawler and ran it again with the --trace and --restart option.

(I also cleared up disk space. When I first "restarted" fscrawler, I ran into a read-only error, so I used Kibana and put "read_only_allow_delete": "false". It was able to run with the --restart option after that. I suspected it was because there wasn't a lot of space left in the drive.)

Thanks for your help!

Best,

Karen

dadoonet · June 26, 2020, 8:55am

Did it move forward?

It might be a date issue. For now, FSCrawler is comparing dates to see if a file is newer or not than the last time it ran. The --restart option indeed simply ignores the date and indexes again everything.

Yes. But if you use the auto mode it could be faster. See OCR integration — FSCrawler 2.10-SNAPSHOT documentation.

morningkaren · June 26, 2020, 3:57pm

Hi! Thanks for your response.

The indexing did not move forward. I increased the heap size and that seemed to help for one of the folders I was trying to index, though.

Thanks for the auto mode tip. I think it will be helpful if I use Tesseract-OCR in the future with fscrawler.

morningkaren · June 26, 2020, 3:57pm

For now, we decided to not use Tesseract-OCR and the files were indexed fine.

system · July 24, 2020, 3:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
FsCrawler does not do anything, does not index pfd's Elasticsearch	4	1247	March 10, 2022
Problem when using Elasticsearch and Tesseract-OCR Elasticsearch	15	2093	August 19, 2020
ElasticSearch - fscrawler missing documents in Index Elasticsearch	8	3000	October 30, 2017
FSCrawler is not indexing consistently Elasticsearch	7	1317	April 15, 2019
FSCrawler not indexing certain fields Elasticsearch	2	458	July 27, 2018

Fscrawler not indexing all files

Related topics