Web crawler (github.com/elastic/crawler) search behind corporate proxy

I am trying to run the GitHub - elastic/crawler. It works great for public websites.

To make it work for a customer, we had them remove limits from a proxy server so that we could scrape content from their website. However, I am not sure how to make sure that crawler uses that proxy server URL as the gateway to get to the customer's website.

Hi @Vikram_Tiwari

The Crawler has proxy configurations that you can configure. See these example configs: crawler/config/crawler.yml.example at d3f1bd30eb791a218c62a0c32f06a3c6bbf880e9 · elastic/crawler · GitHub

Can you check if configuring these allows crawling through the proxy server?

Awesome! This fixed it. I was expecting it to be at crawler docker level but this is much better.

1 Like