If FSCrawler stops mid-scan, will it restart from scratch?

As a newcomer to Elasticsearch, I want to index a large filesystem (100TB) with FSCrawler.

I’ve heard that if the initial scan is interrupted, FSCrawler might re-scan everything from the beginning.

What’s the best strategy to handle this for such a massive dataset?

Are there checkpointing/resume features, configuration tweaks, or alternative workflows to avoid redundant work?

Welcome!

This is correct and until a change is made the only workaround I can see is to start multiple fscrawler instances, one per dir in the root directory.

Thank you for your support. Then once I scan all the files using multiple FSCrawler instances, will running FSCrawler on the root directory continue tracking changes without performing a full re-scan?

I think you will need to keep it running as it was running for the first run.
I said "I think" because I'm not sure about it. I don't remember how ids are computed.

May be you could run it from the root and use different includes settings for each instance. And then run again from root without the includes setting... That way, all ids should be consistent.