Sitemap for example.com located at sub.example.com

Marcelo_da_Silva · January 29, 2021, 11:04am

Hi, the sitemap of a website is located on a subdomain.

Something like

example.com (main)
sub.example.com/sitemap_index.xml (sitemap for example.com)
example.com/robots.txt (it has a Sitemap key indicating the location of the sitemap)

But it seems that the Swiftype crawler does not recognize the sitemap on robots.txt.
And, from the dashboard, I can't setup the sitemap on another domain (the domain part is read only).

Do you know how I can configure it to read properly from the sitemap?

Sean_Story · January 29, 2021, 10:17pm

Hi @Marcelo_da_Silva !

have you configured sub.example.com as one of your domains? Sub-domains are not automatically included, each discrete domain must be added individually.

Marcelo_da_Silva · February 3, 2021, 4:15pm

Thanks for your reply @Sean_Story

I'm on the standard plan, so, I can use only a single domain. If I add my "sub.example.com" domain only, do the crawler gets the pages from "example.com" inside the sitemap?

Because, all the links for "example.com" are on "sub.example.com/sitemap_index.xml".

To add a better context, the "sub" domain is were my CMS lives, so, all pages are created on that side and that's why it's easier to leave the sitemap there.

Sean_Story · February 3, 2021, 7:58pm

@Marcelo_da_Silva sounds like you need to upgrade your plan level, if you want to crawl two domains. You could do sub.example.com only, but you would not get the pages from just example.com. The other option would be to put your sitemap for example.com on example.com (but you still wouldn't get pages from sub.example.com).

Marcelo_da_Silva · February 4, 2021, 7:54am

Hummm, "sub.example.com" does not have any pages to be crawled, all pages in the sitemap are from "example.com".

In that case, I have to upgrade and add 2 domains just to have make crawler finds my sitemap:

add "example.com" so the pages can be recognized and the documents created
add "sub.example.com" so the sitemap can be recognized (and only that, since will be no documents on that domain)

Is that correct?

Sean_Story · February 4, 2021, 4:22pm

That's definitely one option.

The other option is to move or copy your sitemap to example.com, if there's no content on sub.example.com that you want crawled (other than the current sitemap).

Marcelo_da_Silva · February 12, 2021, 8:14am

The weird part is that, when I add the example.com Swiftype recognizes the sitemap url on the subdomain, even displaying the sitemap url on the success screen.

But then when the crawler runs, it seems to not use that sitemap.

Brian_McGue · February 17, 2021, 7:48pm

Hi @Marcelo_da_Silva,

Ya, that's definitely confusing. What's likely happening is it's pulling the sitemap URL from the robots.txt file associated with your site. I can't confirm because I don't know what your robots.txt looks like.

The problem then comes when the crawler goes to fetch the sitemap and it's not on the same subdomain. So we see it, but we can't fetch it.

Unfortunately, your options remain as Sean described. If hosting your sitemap on the example.com domain is not convenient for you, you will have to upgrade in order to crawl it.

Best,
Brian

Marcelo_da_Silva · February 22, 2021, 10:07am

Hello Brian. That's exactly what's happening. My sitemap url is on my robots.txt and it seems that the Swiftype dashboard recognizes it during the setup.

But when it is running the crawler, the sitemap is ignored completely as it lives on another subdomain. This seems to be a bug from your side as the behavior is very confusing for the users.

At the moment, I created a proxy to be able to access the sitemap over the 'example.com' without need to move the entire sitemap to it, so, it should be okay!

Marcelo_da_Silva · February 22, 2021, 10:55am

Another thing I noticed was that:

my sitemap was named like: 'example.com/sitemap_index.xml'
during setup, it was recognized properly
during crawler it seems it wasn't been used, then I went o check the config
there was a "custom sitemap" URL there with "example.com/sitemap.xml" on it

So, it seems that, the setup it not working properly and the crawler assumes it is named like "/sitemap.xml".

Brian_McGue · February 22, 2021, 6:06pm

The proxy was a great choice! Well done.

You are correct in that the crawler assumes the sitemap will exist at example.com/sitemap.xml. As noted in the documentation for Crawler Configuration, you can configure a custom sitemap URL.

Also, I agree, the messaging around recognizing a sitemap on the robots.txt could be improved if that sitemap URL does not match a domain that's been configured to crawl.

Marcelo_da_Silva · February 23, 2021, 2:35pm

Good!

Thanks @Brian_McGue and @Sean_Story for your support!

system · March 23, 2021, 2:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Additional sitemaps being ignored Elastic Search elastic-site-search	2	623	June 9, 2021
Swifttype Crawl Rate Elastic Search elastic-site-search	4	905	March 24, 2019
Web crawler is crawling URLs that are not on the sitemap Elastic Search	2	496	February 14, 2024
Sitemap in Swiftype Elastic Search elastic-site-search	2	753	April 19, 2019
Swiftype stopped crawling all of a sudden - sitemap and robot files apparently are not reachable Elastic Search elastic-site-search	1	282	March 15, 2023

Sitemap for example.com located at sub.example.com

Related topics