Read this and specifically the "Also be patient" part.
I personally consider that someone who has a cluster in production down is more urgent than a question about a project that does not exist yet.
Anyway, some answers:
- No idea. Never used Nutch. May be ask to the Nutch mailing list if any?
- It depends on what are your needs. I wrote FSCrawler to crawl files on disk for example and parse them with Apache Tika.
- Too wide question.