Web crawler (github.com/elastic/crawler) search behind corporate proxy

Vikram_Tiwari · March 10, 2025, 7:32pm

I am trying to run the GitHub - elastic/crawler. It works great for public websites.

To make it work for a customer, we had them remove limits from a proxy server so that we could scrape content from their website. However, I am not sure how to make sure that crawler uses that proxy server URL as the gateway to get to the customer's website.

nfeekery · March 11, 2025, 10:57am

Hi @Vikram_Tiwari

The Crawler has proxy configurations that you can configure. See these example configs: crawler/config/crawler.yml.example at d3f1bd30eb791a218c62a0c32f06a3c6bbf880e9 · elastic/crawler · GitHub

Can you check if configuring these allows crawling through the proxy server?

Vikram_Tiwari · March 11, 2025, 5:49pm

Awesome! This fixed it. I was expecting it to be at crawler docker level but this is much better.

system · April 8, 2025, 5:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Use web crawler beta app search behind corporate proxy Elastic Search elastic-app-search	4	633	August 30, 2021
Dec 18th, 2021: [en] Crawling content from private network environments using a Crawler deployed on Elastic Cloud Advent Calendar	1	1088	January 15, 2022
Twitter river proxy settings? Elasticsearch	8	549	July 6, 2017
Web crawler (github.com/elastic/crawler) to only fetch specific URLS Elastic Search	2	66	April 7, 2025
Crawl ADFS Authenticated Website using Enterprise crawler Elastic Search elastic-site-search	2	276	September 12, 2023

Web crawler (github.com/elastic/crawler) search behind corporate proxy

Related topics