Web crawler (github.com/elastic/crawler) to only fetch specific URLS

I am playing around with the crawler and it works great. However, I am unable to find a way to a crawl config to scraper only a specific set of URLs and nothing else. I have the list of URLs.

Similar to this question: How to index only given urls in the Elasticsearch using Open Crawler - #3 by jahedi

This fixed it: How to index only given urls in the Elasticsearch using Open Crawler - #2 by nfeekery

You need to:

  • Have seed_urls as the urls that you want to sync
  • sitemap_discovery_disabled: true
  • max_crawl_depth: 1
1 Like