Hi,
I was wondering if one can configure a web crawler such that only the immediate links on the page corresponding to the entry point will be crawled and the rest should be skipped.
Eg.
Entry__page has Link1 and Link2 -> both sub-links should be crawled
Link1_page has Link3 -> should not be considered
i have considered crawl rules, but there is no generic/simple text in the url that will allow to differentiate un-needed links.
You can configure this two ways. If you want this to be a set change for all crawlers you can update the Enterprise Search configuration, just change the value for connector.crawler.crawl.max_crawl_depth.limit.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.