Hi Team,
I am trying to crawl website which uses NTLM authentication. but I am not able to crawl it. I can't see any option in UI to add authentication details for website. and also in crawl api we have only basic and raw as a auth type.
Its not crawling any documents from our website.
You can update the domain with auth type raw, and the value will be used directly in the Authorization header (NTLM is one of the supported authentication schemes).
Unexpected content type for a crawl task with type=content
When Crawler logs unexpected content type... this means it doesn't support or it couldn't recognize the response content-type header. Could you please share the URL if it can be accessed via the public internet or do a basic curl command like this:
Hi Dimitrii,
It seems issue with how domain was accepting authentication, as application team setup a proxy and proxy is taking care of authentication, I am able to crawl domain pages properly.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.