Enterprise search web crawler is giving the below error in logs
Allow none because robots.txt responded with status 599
Error: read_timeout
what could be the potential issue and Please suggest resolution.
I see the below exceptions while adding domain. I can able to access, domain URL and robots.txt file through browser.
This issue is happening for one specific public domain only, I am able to crawl other public websites
Below is EES config
allow_es_settings_modification: true
elasticsearch.host: https://xxxxxxxxxxxxx:9200
elasticsearch.ssl.enabled: true
elasticsearch.ssl.verify: false
kibana.host: http://xxxxxxxx:5601
ent_search.listen_host: 0.0.0.0
ent_search.listen_port: 3002
connector.crawler.http.proxy.host: xxxxxxxxxxxx
connector.crawler.http.proxy.port: 80
connector.crawler.http.proxy.protocol: http
connector.crawler.security.dns.allow_private_networks_access: true
connector.crawler.security.dns.allow_loopback_access: true
connector.crawler.content_extraction.enabled: true
connector.crawler.content_extraction.mime_types: ["application/pdf", "application/msword", "text/plain", "application/xml", "text/html", "text/css"]
crawler.http.proxy.host: xxxxxxxxxxxx
crawler.http.proxy.port: 80
crawler.http.proxy.protocol: http
crawler.security.dns.allow_loopback_access: true
crawler.security.dns.allow_private_networks_access: true
crawler.content_extraction.enabled: true
crawler.content_extraction.mime_types: ["application/pdf", "application/msword", "text/plain", "application/xml", "text/html", "text/css"]
ent_search.ssl.enabled: false
crawler.security.ssl.verification_mode: none
connector.crawler.security.ssl.verification_mode: none
crawler.http.request_timeout: 90
crawler.http.read_timeout: 30