Hi,
I am using the recently released project , Elastic Crawler, and I can not find any configurations for making some delay between each request while crawling on a domain.
Is there any config to set a number to have a specific interval between requests?
Thanks
Hi @jahedi!
The Open Crawler doesn't currently support delays between crawl requests.
We're aware of this gap in functionality so we have this enhancement issue to keep track of it: Abide by crawl delays found in robots.txt · Issue #46 · elastic/crawler · GitHub
That issue specifically covers having crawler respect the robots.txt Crawl-delay
settings, so that would require users have control over the webserver hosting the site. Would this feature resolve your issue? Or do you need a different kind of delay?
Added crawler
Thanks for your response @nfeekery
Yes, I had seen this solution with Crawl-delay
rule in the robots.txt file. But the problem is some websites may not have the robots.txt file or at least may not have the Crawl-delay
option in the robots.txt file. I was looking for a more general option...
I see how a general delay would be useful. Unfortunately Open Crawler can't do this yet. I've created a new issue to track this: Add a general delay between crawl requests · Issue #49 · elastic/crawler · GitHub
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.