Web crawler (github.com/elastic/crawler) to only fetch specific URLS

Vikram_Tiwari · March 10, 2025, 10:17pm

I am playing around with the crawler and it works great. However, I am unable to find a way to a crawl config to scraper only a specific set of URLs and nothing else. I have the list of URLs.

Similar to this question: How to index only given urls in the Elasticsearch using Open Crawler - #3 by jahedi

Vikram_Tiwari · March 10, 2025, 10:43pm

This fixed it: How to index only given urls in the Elasticsearch using Open Crawler - #2 by nfeekery

You need to:

Have seed_urls as the urls that you want to sync
sitemap_discovery_disabled: true
max_crawl_depth: 1

system · April 7, 2025, 10:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to index only given urls in the Elasticsearch using Open Crawler Elastic Search crawler	4	231	July 24, 2024
Web Crawler API endpoints for URLs and crawl rules Elastic Search elastic-app-search	4	499	May 7, 2021
Crawl sitemap only Elastic Search crawler , webcrawler	8	198	April 17, 2025
How to crawl the weburl using ElastiSearch? Elasticsearch	2	478	March 17, 2018
How do you tell ES Web Crawler to stop crawling a parent's child webpages that don't include parent's nameURL name Elastic Search	2	45	January 21, 2025

Web crawler (github.com/elastic/crawler) to only fetch specific URLS

Related topics