Use NTLM authentication while crawling domains

Hi Team,
I am trying to crawl website which uses NTLM authentication. but I am not able to crawl it. I can't see any option in UI to add authentication details for website. and also in crawl api we have only basic and raw as a auth type.
Its not crawling any documents from our website.

Please suggest.

Thanks,
Disha

Hi Disha,

Unfortunately, there's no option to configure auth in UI, and you have to do this via API:
https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-http-authentication

You can update the domain with auth type raw, and the value will be used directly in the Authorization header (NTLM is one of the supported authentication schemes).

Hi Chenhui_Wang,
Do I need to configure NTLM server as a http proxy in enterprise_search.yml like below?

can we use both appSearch authentication and website authorization in same update domain api

crawler.http.proxy.host:  auth.example.com
crawler.http.proxy.port: 443
crawler.http.proxy.protocol: https
crawler.http.proxy.username: username
crawler.http.proxy.password: password

Hi Team,
crawler event logs showing event.type: denied with below message
Unexpected content type for a crawl task with type=content

what does it means?

Hi @Disha_Bodade,

Unexpected content type for a crawl task with type=content

When Crawler logs unexpected content type... this means it doesn't support or it couldn't recognize the response content-type header. Could you please share the URL if it can be accessed via the public internet or do a basic curl command like this:

curl -i {denied_url}

and include the output.

You can use enterprice_search.yml however if you configure a proxy server in this way, all your crawlers will be utilizing those proxy properties.

If you want to use proxy configuration per domain, you can use Crawler API to add your configuration.

Our Application team has enabled basic auth for domain, but now also, when I am trying to crawl its showing

"fetch": {
                "timestamp": "Fri, 07 Apr 2023 16:04:15 +0000",
                "event_id": "64303effe4f766fe7de5b7ff",
                "message": "Unexpected content type  for a crawl task with type=content",
                "event_outcome": "failure",
                "duration_msec": 0.00596,
                "http_response": {
                    "status_code": 302,
                    "body_bytes": 0
                },
                "redirect": null
            }

Unexpected content type for a crawl task with type=content, I guess some misconfiguration I have done.

Have you configured your Crawler with the basic auth credentials?
If you are using Elastic web crawler - Managing crawls in Kibana | Elastic Enterprise Search documentation [8.7] | Elastic.

If you are using App Search Crawler Web crawler reference | Elastic App Search Documentation [8.7] | Elastic

Could you please verify that your website is accessible via curl or similar tools using basic auth?

Hi Dimitrii,
It seems issue with how domain was accepting authentication, as application team setup a proxy and proxy is taking care of authentication, I am able to crawl domain pages properly.

Thanks,
Disha

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.