Web Crawler Failed HTTP request: Unable to request "< domain >" because it resolved to only private/invalid addresses

When I try to run the web crawler against a site we host, it fails with this error:

Failed HTTP request: Unable to request "< domain >" because it resolved to only private/invalid addresses

The site in question would resolve to a 10.n.n.n ip address. Is the crawler configured to reject that? Is there a way to override that behavior?

Not sure it's related, but if I target my personal site, not hosted internally, it fails as well.

In the logs I see:

Allow none because robots.txt responded with status 599

and

Failed HTTP request: Remote host terminated the handshake

That also happens if I target Elastic Blog: Stories, Tutorials, Releases | Elastic Blog.

I double checked my personal site's robot.txt file. It's the default Drupal 8 robots.txt file. So there shouldn't be anything in it that would completely block the crawler.

Anyway, I'm glad this still beta. :slight_smile:

Yes, that's the current, default behavior and it will become configurable in the next minor release.

As for the other issue you're experiencing, it sounds like you can't crawl any site at all, is that correct?

Nice.

Yep, can't crawl my personal site, or Elastic Blog: Stories, Tutorials, Releases | Elastic Blog. Haven't tried any other sites yet.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.