To make it work for a customer, we had them remove limits from a proxy server so that we could scrape content from their website. However, I am not sure how to make sure that crawler uses that proxy server URL as the gateway to get to the customer's website.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.