Web Crawler Setup: "Content Verification" for Domain fails

Hi,

I'd like to setup a domain for a web crawler - it fails in the last step "Content Verification" with:

"The web server at http://www.test.de redirected us to a different domain URL (http://www.test.com/). If you want to crawl this site, please use http://www.test.com as the domain name."

Problem is, I need to crawl pages which are located at a subdirectory of test.de, say test.de/ressources and I can't touch any DNS or server-settings for this domain to change the redirect behaviour of root or "/".

What I need is to setup a domain in App Search independently of it's redirect behaviour, say via a flag or something - is there a setting I overlook in the docs?

Is there another possibility beside using another tool for crawling and uploading json-files?

Thanks for helping.

Some more details:

(Note: I use "test.de" for demonstration purposes of the problem, unfortunately I cannot disclose the real domain)

  • I use a self hosted stack, version 7.16.2

This is a little clunky, but you can add your domain via API, to bypass verification. Here is the API documentation:

Example request to add a domain:

curl -X POST http://[ENTERPRISE-SEARCH-URL]/api/as/v0/engines/[ENGINE-NAME]/crawler/domains -H "Content-Type: application/json" -d '
{
  "name": "http://www.test.de"
}'

Once this is done, you can add the entrypoint in the App Search UI. Alternatively, the entrypoint can be added using API:

curl -X POST http://[ENTERPRISE-SEARCH-URL]/api/as/v0/engines/[ENGINE-NAME]/crawler/domains/[DOMAIN-ID]/entry_points -H "Content-Type: application/json" -d '
{
  "value": "/ressources/"
}'

The value of DOMAIN-ID is the id returned in the first response, when adding the domain.

1 Like

Thank you :slight_smile: I'd propose a UI-change to achieve this in a more comfortable way.

Furthermore, adding a domain via curl and ui should have the same result - I guess the ui is using a different api?