Make the search crawler ignore sitemap

The app search is connected to a crawler. Is there a way to make this crawler ignore the sitemap defined in robots.txt?

Hey there, @GTHvidsten there isn't currently a way for you to configure the crawler to ignore the sitemap listed by the robots.txt but you can certainly dictate the scope of crawls using crawl rules or tags in the website itself to ignore content etc. You can also upload a custom sitemap for the crawler to use in crawls. May I ask what your use case is to ignore the robots.txt defined sitemap?

The use case is that I would like to generate the sitemap based on the documents in Elastic. That would significantly speed up the generation time on our side. The sitemap shouldn't contain anything else that the crawler can't find, so the documents in Elastic would be the perfect basis for a sitemap.
But, as I figured out, Elastic would be bound by this sitemap, creating a catch-22. Any new pages on the website would not be crawled as they are not part of the sitemap that the crawler uses.
I've already come up with an alternate solution to this, but I still think it would be nice to have the crawler ignore the sitemap so that a sitemap could be generated from what has been crawled.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.