Crawlink 100k domain names

Hello,

I'm interested to know if it is possible to crawl ~100k domain names. If so, it is possible to add / remove domains by using an API ?

Also I would be really interested for a possibility to schedule the crawling (eg: monthly, two times a month ...)

Tamara from the last crawler webinar told me that @oleksiy-elastic would be interested in this challenge!

Regards,

Rémi Poulenard

Hello, Remi,

I'm interested to know if it is possible to crawl ~100k domain names.

This sounds like an interesting challenge indeed! There is nothing within the crawler that would prevent us from being able to achieve scale like this, but I am sure we'd find some rough edges if we were to try it. That is why I'd like to try and see what we could find. Could you provide us with a bit more information on your use case to help us model your configuration?

  • What are the domains you're trying to crawl? (if you can share a list here or via email, it'd be extremely helpful)
  • How many pages do you expect each domain to have (rough estimate, average, any information would be useful).
  • How often do you expect the content to change on those domains?
  • Do you have sitemaps on your domains?

If so, it is possible to add / remove domains by using an API ?

With the 7.13.0 release you should be able to manage all aspects of the crawler configuration through an API. The APIs are in beta, so documentation is being actively worked on at the moment. You can find the current version of the docs here: Web crawler API (beta) reference | Elastic App Search Documentation [7.16] | Elastic.

Also I would be really interested for a possibility to schedule the crawling (eg: monthly, two times a month ...)

With the 7.13.0 release today you should be able to configure automatic crawling with a specified frequency. You can find the API docs here: Web crawler API reference | App Search documentation [8.11] | Elastic or you can do it through the UI.

Than you for trying the Enterprise Search crawler!

1 Like

Hello Oleksiy,

First of all thank you for all the insights you’ve shared.

We would be glad to introduce you to our business and discuss about what could be the use cases of the App Search Crawler.

Do you have an e-mail on which we could provide you some context ?

Regards,

Rémi

Sorry for the delay with my responses, some weird notification issues on my end :frowning:

You can email me at oleksiy.kovyrin@elastic.co or drop by our community slack if that may be easier (https://elasticstack.slack.com).