Index based Engine lacks crawler API capability

Hi there,
I have some indexes created via the crawler in elastic cloud. These are automatically prefixed with search-

I need to trigger a single page crawl (partial) crawl via an API call.
This API only appear to be available for Engines:
(/api/as/v1/engines/[engine-name]/crawler/crawl_requests)

So as per recommendation here:(App Search and Workplace Search product compatibility | Enterprise Search documentation [8.15] | Elastic)
I created an index based engine for this index.
However, the API call fail with error message:
"error": "No crawler domains configured on the engine "[my-engine-name""

If I create an "app-search-managed-docs" engine type, - this creates a new hidden index ".ent-search-engine-documents-[engine-name]. The partial crawl api requests against this engine work. But then i have a hidden index, and not the full crawler capability (no content extraction)

Is there a way forward here for an index based engine? I would much prefer to have search-xxx indexes created and fed by Elasticsearch web crawlers, the only thing they don't offer is the partial crawl api call - and the documentation suggested that an index based engine will provide that.

Thank you in advance

Hi Svetlana -

The web crawler for App Search will only work with App Search managed indices. Unfortunately, direct Elasticsearch indexes will not be able to work with it.

However, have you taken a look at the Elastic Open Web Crawler ? Hopefully this will help with what you need to do. Although there is not an API interface to it, you can control the crawling via the CLI, and you may be able to extend the code to suit your needs.

Hi Mark, Thank you so much for your reply.

Is it on the roadmap to add this API to future elasticrawler versions?
Or will it be discontinued in favour of the open web crawler?

And sorry for tangent, but i see no way of using ingest pipelines with app search managed indexes?

Thank you again for your time and response

Svetlana -

The Open Web Crawler is probably the safer route for future proofing your application, so I would try and use that as there's no guarantee of the App Search crawler adding a feature in to work with non-managed indices.

As for ingest pipelines, not directly - what are you looking to do with ingest pipelines in this context?

Hi Mark, thanks again.

Ingest pipelines - would be to parse metatags into document fields, as the crawler that comes with the App Search managed indices does not offer extraction rules to do same.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.