Webcrawl one level

etp · March 22, 2024, 3:30pm

Hi,
I was wondering if one can configure a web crawler such that only the immediate links on the page corresponding to the entry point will be crawled and the rest should be skipped.
Eg.
Entry__page has Link1 and Link2 -> both sub-links should be crawled
Link1_page has Link3 -> should not be considered

i have considered crawl rules, but there is no generic/simple text in the url that will allow to differentiate un-needed links.

Thanks in advance.

nfeekery · March 22, 2024, 3:38pm

Hi @etp

You can configure this two ways. If you want this to be a set change for all crawlers you can update the Enterprise Search configuration, just change the value for connector.crawler.crawl.max_crawl_depth.limit.

You can also do this per-crawl as well if you choose to crawl with custom settings.

system · April 19, 2024, 3:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.