Make the search crawler ignore sitemap

GTHvidsten · December 14, 2022, 11:31am

The app search is connected to a crawler. Is there a way to make this crawler ignore the sitemap defined in robots.txt?

Serena_Chou · December 14, 2022, 10:31pm

Hey there, @GTHvidsten there isn't currently a way for you to configure the crawler to ignore the sitemap listed by the robots.txt but you can certainly dictate the scope of crawls using crawl rules or tags in the website itself to ignore content etc. You can also upload a custom sitemap for the crawler to use in crawls. May I ask what your use case is to ignore the robots.txt defined sitemap?

GTHvidsten · December 15, 2022, 9:08am

The use case is that I would like to generate the sitemap based on the documents in Elastic. That would significantly speed up the generation time on our side. The sitemap shouldn't contain anything else that the crawler can't find, so the documents in Elastic would be the perfect basis for a sitemap.
But, as I figured out, Elastic would be bound by this sitemap, creating a catch-22. Any new pages on the website would not be crawled as they are not part of the sitemap that the crawler uses.
I've already come up with an alternate solution to this, but I still think it would be nice to have the crawler ignore the sitemap so that a sitemap could be generated from what has been crawled.

system · January 12, 2023, 9:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Additional sitemaps being ignored Elastic Search elastic-site-search	2	623	June 9, 2021
Ignoring robots noindex / nofollow in Elastic crawler Elastic Search crawler	4	36	November 18, 2024
Swifttype Crawl Rate Elastic Search elastic-site-search	4	905	March 24, 2019
Pdf documents specified in the sitemap are not being indexed by web crawler Elasticsearch	2	85	June 14, 2024
Web crawler is crawling URLs that are not on the sitemap Elastic Search	2	500	February 14, 2024

Make the search crawler ignore sitemap

Related topics