But it seems that the Swiftype crawler does not recognize the sitemap on robots.txt.
And, from the dashboard, I can't setup the sitemap on another domain (the domain part is read only).
Do you know how I can configure it to read properly from the sitemap?
have you configured sub.example.com as one of your domains? Sub-domains are not automatically included, each discrete domain must be added individually.
I'm on the standard plan, so, I can use only a single domain. If I add my "sub.example.com" domain only, do the crawler gets the pages from "example.com" inside the sitemap?
To add a better context, the "sub" domain is were my CMS lives, so, all pages are created on that side and that's why it's easier to leave the sitemap there.
@Marcelo_da_Silva sounds like you need to upgrade your plan level, if you want to crawl two domains. You could do sub.example.com only, but you would not get the pages from just example.com. The other option would be to put your sitemap for example.com on example.com (but you still wouldn't get pages from sub.example.com).
The other option is to move or copy your sitemap to example.com, if there's no content on sub.example.com that you want crawled (other than the current sitemap).
The weird part is that, when I add the example.com Swiftype recognizes the sitemap url on the subdomain, even displaying the sitemap url on the success screen.
But then when the crawler runs, it seems to not use that sitemap.
Ya, that's definitely confusing. What's likely happening is it's pulling the sitemap URL from the robots.txt file associated with your site. I can't confirm because I don't know what your robots.txt looks like.
The problem then comes when the crawler goes to fetch the sitemap and it's not on the same subdomain. So we see it, but we can't fetch it.
Unfortunately, your options remain as Sean described. If hosting your sitemap on the example.com domain is not convenient for you, you will have to upgrade in order to crawl it.
Hello Brian. That's exactly what's happening. My sitemap url is on my robots.txt and it seems that the Swiftype dashboard recognizes it during the setup.
But when it is running the crawler, the sitemap is ignored completely as it lives on another subdomain. This seems to be a bug from your side as the behavior is very confusing for the users.
At the moment, I created a proxy to be able to access the sitemap over the 'example.com' without need to move the entire sitemap to it, so, it should be okay!
You are correct in that the crawler assumes the sitemap will exist at example.com/sitemap.xml. As noted in the documentation for Crawler Configuration, you can configure a custom sitemap URL.
Also, I agree, the messaging around recognizing a sitemap on the robots.txt could be improved if that sitemap URL does not match a domain that's been configured to crawl.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.