Runtime Environment Requirement for FS Crawler

We are planning to deploy FS Crawler to the VM on Azure, and I am trying to provisioning the VM. Here is the information to support the provision:

  • OS: Ubuntu 20.4
  • PDF files total size: 1.5 TB
  • FS Crawler Schedule: Set a OS daily cron job to run FS Crawler once per day

As per FS Crawler documentation (Tips and tricks — FSCrawler 2.10-SNAPSHOT documentation), it will generate huge temporary files, and we can set cron job to do cleanup periodically. This raise a concern when I do the provisioning the VM on cloud. Can anyone recommend what kind of configuration I need for the VM, such as number of CPU cores, storage?

Any help will be greatly appreciated.

I think this advice in the documentation only applies to media files and specifically mp4 videos.
Is that what you want to index?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.