Add site to crawler via api

I'd like to add sites to search via the API which akaict isn't currently possible officially - is there :

  • a way to do this unofficially?
  • plans to do this eventually?

@hilt86 :wave:

Assuming you're talking about the web crawler beta released with 7.11 - we do have plans of extending the documented APIs to support the same functionality the web crawler offers in the dashboard.

As for unofficial ways to do this: you could figure out the private endpoints being used in the dashboard with your browser's developer tools and then if you're using a non-SAML auth source, you could perform a request with basic auth to that endpoint. This is obviously not recommended, and would most likely break in upcoming releases :smirk:

ok thanks - how does elastic app search work with the users and roles in Kibana? I want to create an account that can only manage app search and indexes created therein..

ok I managed to add 42000 sites to the crawler and it didn't really appreciate that...asides from the fact that the crawler ui doesn't paginate the real issue was that initiating a crawl started and then promptly died!

I need to index about ~50k+ sites - is crawler cut out for this?

Can you clarify: Are you trying to index 50K different sites (website domains), or 50K different pages within one website?

The crawler should be able to crawl 50K pages on a website, but you'll need to increase these settings in enterprise-search.yml:

#crawler.crawl.max_unique_url_count.limit: 10000
#crawler.crawl.max_duration.limit: 86400 # seconds

As for 50K sites... This is an interesting use case! I'd love to learn more about it. I don't know any reason why it wouldn't work, but I have not tried this many.

initiating a crawl started and then promptly died!

Could you share any error messages from the application logs under log/crawler.log?
Have you configured any crawl rules for any sites? If the crawl rules reject the entry point URL itself, the crawler will start and stop immediately, thinking it has nothing to do.

ok I will try that! I'm trying to index 50k different sites for an OSINT exercise for internal consumption. For example I want to index a bunch of sites and get notified if certain snippets of text appear on these sites (which I will use the API once I have a quality index)

H

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.