Cannot crawl a website

Marten · July 28, 2021, 8:29am

Hello,

I'm trying to crawl a website (www.werk.nl) but the crawler gets a 599 error on robots.txt.
If I use my browser, I can open robots.txt and the website without problems.
What can be the issue?

Marten

Marten · July 28, 2021, 11:52am

Btw, I'm using AppSearch 7.13.4

Byron_H · July 29, 2021, 12:08pm

Hi Marten, a 599 status code can indicate a network connection timeout error. Can you validate that your server is not taking an undue amount of time when responding to App Search's request to robots.txt?

Marten · July 30, 2021, 1:02pm

Hi Byron,

Please define undue amount.
If I access robots.txt from my browser, I get a response immediately.
In the logging of the crawler I see that the log entry with the status code 599 has exactly the same @timestamp as the log entry with the crawler start, meaning that the the crawler doesn't seem to wait very long for the robots.txt.

Marten

Marten · August 2, 2021, 7:41am

I will transfer this issue to the support portal, this one can be closed.

system · August 30, 2021, 7:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Web crawler Error: Allow none because robots.txt responded with status 599, read_timeout Elastic Cloud Enterprise (ECE)	0	174	May 16, 2024
Web Crawler Failed HTTP request: Unable to request "< domain >" because it resolved to only private/invalid addresses Elastic Search elastic-app-search	4	1137	May 18, 2021
No documents after 20 hours of crawling Elastic Search elastic-site-search	7	1206	June 4, 2019
Read_timeout error for url content extraction using App Search API Elasticsearch	1	18	July 29, 2024
Swiftype stopped crawling all of a sudden - sitemap and robot files apparently are not reachable Elastic Search elastic-site-search	1	282	March 15, 2023

Cannot crawl a website

Related topics