Crawling authenticated web sites - Cookies

Hi all,

we are trying to crawl pages which require Basic Authentication. Once authenticated a session is established on the server and a cookie is responded. This cookie identifies the the user on the next request and no new session is generated.

Ho does the Elastic Web Crawler handles such authentication cookies? Does the crawler use such cookies once they are responded after the first authenticated crawling request?

Or is each request a new authentication with potentially a new session? This would raise thousends of sessions during a very short time period which may raise problems.

Regards

Sebastian

Hi @sebastianboelling,

Elastic Web Crawler doesn't keep server-side cookies. If you configure Basic Authentication then the crawler will apply the authentication header to all requests.

This would raise thousends of sessions during a very short time period which may raise problems.

Similar to other search bots Elastic Web Crawler uses a stateless approach so it doesn't require a session. Session-based authentication is not supported.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.