Enterprise Search Web Crawler fails on authn enabled websites

ymoriarty · September 24, 2021, 3:46pm

The web crawler feature worked great for a Apache web site with no authentication. Once I enabled basic authentication, the web crawler failed. Is there a way to use the web crawler on sites that have authentication enabled or are using SAML SSO?

ymoriarty · September 28, 2021, 3:36pm

Following up on this as there have been no replies. Thanks.

Carlos_D · September 30, 2021, 9:37am

Hi @ymoriarty !

Crawling authentication is on the roadmap. A workaround is to let the specific user-agent string for the crawler bypass authentication on the website, if that is possible.

You can set the crawler.http.user_agent in the Enterprise Search configuration. Please take a look at the configuration documentation.

Stay tuned for next releases!

Topic		Replies	Views
Best Web crawler to index data to elasticsearch Elasticsearch	0	273	March 15, 2023
Crawl ADFS Authenticated Website using Enterprise crawler Elastic Search elastic-site-search	1	286	August 15, 2023
Crawling authenticated web sites - Cookies Elastic Search elastic-app-search	1	297	November 23, 2023
Use NTLM authentication while crawling domains Elastic Search elastic-app-search	8	717	April 11, 2023
Crawling Protected websites Elastic Search elastic-app-search	0	312	February 2, 2023

Enterprise Search Web Crawler fails on authn enabled websites

Related topics