we are trying to crawl pages which require Basic Authentication. Once authenticated a session is established on the server and a cookie is responded. This cookie identifies the the user on the next request and no new session is generated.
Ho does the Elastic Web Crawler handles such authentication cookies? Does the crawler use such cookies once they are responded after the first authenticated crawling request?
Or is each request a new authentication with potentially a new session? This would raise thousends of sessions during a very short time period which may raise problems.
Elastic Web Crawler doesn't keep server-side cookies. If you configure Basic Authentication then the crawler will apply the authentication header to all requests.
This would raise thousends of sessions during a very short time period which may raise problems.
Similar to other search bots Elastic Web Crawler uses a stateless approach so it doesn't require a session. Session-based authentication is not supported.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.