Sitemap for example.com located at sub.example.com

Hi, the sitemap of a website is located on a subdomain.

Something like

But it seems that the Swiftype crawler does not recognize the sitemap on robots.txt.
And, from the dashboard, I can't setup the sitemap on another domain (the domain part is read only).

Do you know how I can configure it to read properly from the sitemap?

Hi @Marcelo_da_Silva !

have you configured sub.example.com as one of your domains? Sub-domains are not automatically included, each discrete domain must be added individually.

Thanks for your reply @Sean_Story

I'm on the standard plan, so, I can use only a single domain. If I add my "sub.example.com" domain only, do the crawler gets the pages from "example.com" inside the sitemap?

Because, all the links for "example.com" are on "sub.example.com/sitemap_index.xml".

To add a better context, the "sub" domain is were my CMS lives, so, all pages are created on that side and that's why it's easier to leave the sitemap there.

@Marcelo_da_Silva sounds like you need to upgrade your plan level, if you want to crawl two domains. You could do sub.example.com only, but you would not get the pages from just example.com. The other option would be to put your sitemap for example.com on example.com (but you still wouldn't get pages from sub.example.com).

Hummm, "sub.example.com" does not have any pages to be crawled, all pages in the sitemap are from "example.com".

In that case, I have to upgrade and add 2 domains just to have make crawler finds my sitemap:

  • add "example.com" so the pages can be recognized and the documents created

  • add "sub.example.com" so the sitemap can be recognized (and only that, since will be no documents on that domain)

Is that correct?

That's definitely one option.

The other option is to move or copy your sitemap to example.com, if there's no content on sub.example.com that you want crawled (other than the current sitemap).

The weird part is that, when I add the example.com Swiftype recognizes the sitemap url on the subdomain, even displaying the sitemap url on the success screen.

But then when the crawler runs, it seems to not use that sitemap.

Hi @Marcelo_da_Silva,

Ya, that's definitely confusing. What's likely happening is it's pulling the sitemap URL from the robots.txt file associated with your site. I can't confirm because I don't know what your robots.txt looks like.

The problem then comes when the crawler goes to fetch the sitemap and it's not on the same subdomain. So we see it, but we can't fetch it.

Unfortunately, your options remain as Sean described. If hosting your sitemap on the example.com domain is not convenient for you, you will have to upgrade in order to crawl it.

Best,
Brian

Hello Brian. That's exactly what's happening. My sitemap url is on my robots.txt and it seems that the Swiftype dashboard recognizes it during the setup.

But when it is running the crawler, the sitemap is ignored completely as it lives on another subdomain. This seems to be a bug from your side as the behavior is very confusing for the users.

At the moment, I created a proxy to be able to access the sitemap over the 'example.com' without need to move the entire sitemap to it, so, it should be okay!

1 Like

Another thing I noticed was that:

So, it seems that, the setup it not working properly and the crawler assumes it is named like "/sitemap.xml".

The proxy was a great choice! Well done.

You are correct in that the crawler assumes the sitemap will exist at example.com/sitemap.xml. As noted in the documentation for Crawler Configuration, you can configure a custom sitemap URL.

Also, I agree, the messaging around recognizing a sitemap on the robots.txt could be improved if that sitemap URL does not match a domain that's been configured to crawl.

Good! :slight_smile:

Thanks @Brian_McGue and @Sean_Story for your support!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.