Some websites like http://www.gilacountyaz.gov/government/assessor/index.php have a bunch of internal links that should be absolute paths, but do not have the leading slash.
This causes a web crawler to generate wrong links. Instead of
web crawler creates
This can potentially create infinite loop and a lot of 404 errors.
Web browsers like Firefox or Chrome can handle this, because there is
<base> tag present on the website.
<head> <base href="http://www.gilacountyaz.gov/index.php"/> </head>
It allows browser to interpret these links correctly, but webcrawler is ignoring it. Is there any quick workaround that will make webcrawler work correctly?