Runtime Environment Requirement for FS Crawler

We are planning to deploy FS Crawler to the VM on Azure, and I am trying to provisioning the VM. Here is the information to support the provision:

  • OS: Ubuntu 20.4
  • PDF files total size: 1.5 TB
  • FS Crawler Schedule: Set a OS daily cron job to run FS Crawler once per day

As per FS Crawler documentation (Tips and tricks — FSCrawler 2.10-SNAPSHOT documentation), it will generate huge temporary files, and we can set cron job to do cleanup periodically. This raise a concern when I do the provisioning the VM on cloud. Can anyone recommend what kind of configuration I need for the VM, such as number of CPU cores, storage?

Any help will be greatly appreciated.

I think this advice in the documentation only applies to media files and specifically mp4 videos.
Is that what you want to index?

