I'm interested to know if it is possible to crawl ~100k domain names.
This sounds like an interesting challenge indeed! There is nothing within the crawler that would prevent us from being able to achieve scale like this, but I am sure we'd find some rough edges if we were to try it. That is why I'd like to try and see what we could find. Could you provide us with a bit more information on your use case to help us model your configuration?
- What are the domains you're trying to crawl? (if you can share a list here or via email, it'd be extremely helpful)
- How many pages do you expect each domain to have (rough estimate, average, any information would be useful).
- How often do you expect the content to change on those domains?
- Do you have sitemaps on your domains?
If so, it is possible to add / remove domains by using an API ?
With the 7.13.0 release you should be able to manage all aspects of the crawler configuration through an API. The APIs are in beta, so documentation is being actively worked on at the moment. You can find the current version of the docs here: Web crawler (beta) API reference | Elastic App Search Documentation [7.x] | Elastic.
Also I would be really interested for a possibility to schedule the crawling (eg: monthly, two times a month ...)
With the 7.13.0 release today you should be able to configure automatic crawling with a specified frequency. You can find the API docs here: Web crawler (beta) API reference | Elastic App Search Documentation [7.13] | Elastic or you can do it through the UI.
Than you for trying the Enterprise Search crawler!