Parallel crawler execution

_leo · October 27, 2021, 9:12pm

I would like to create a crawler per domain. This would allow crawling each domain independently and running them at different times.

However, it doesn't seem possible. If running the crawler consumes a lot of resources, is there a way to allocate more resources to it ?

orhantoy · October 28, 2021, 10:30am

Have you tried the crawler.workers.pool_size.limit config?

From the reference config file:

# The number of parallel crawls allowed per instance of Enterprise Search.
# By default, it is set to 2x the number of available logical CPU cores.
# Note: On Intel CPUs, the default value is 4x the number of physical CPU cores
# due to hyper-threading (https://en.wikipedia.org/wiki/Hyper-threading).
#
#crawler.workers.pool_size.limit: N

system · November 25, 2021, 10:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Max number of App Search engines Elastic Search elastic-app-search	3	414	May 24, 2023
Parallelism Per Request Elasticsearch	2	672	August 30, 2018
Throttling of Elastic Web Crawler Elastic Search elastic-app-search	2	306	October 11, 2023
CPU affinity when running multiple nodes on a single host Elasticsearch	5	1209	June 5, 2020
Running multiple instance per physical host/machine Elasticsearch	5	886	July 5, 2017

Parallel crawler execution

Related topics