Index based Engine lacks crawler API capability

Svetlana_Chirkova · August 12, 2024, 3:04pm

Hi there,
I have some indexes created via the crawler in elastic cloud. These are automatically prefixed with search-

I need to trigger a single page crawl (partial) crawl via an API call.
This API only appear to be available for Engines:
(/api/as/v1/engines/[engine-name]/crawler/crawl_requests)

So as per recommendation here:(App Search and Workplace Search product compatibility | Enterprise Search documentation [8.15] | Elastic)
I created an index based engine for this index.
However, the API call fail with error message:
"error": "No crawler domains configured on the engine "[my-engine-name""

If I create an "app-search-managed-docs" engine type, - this creates a new hidden index ".ent-search-engine-documents-[engine-name]. The partial crawl api requests against this engine work. But then i have a hidden index, and not the full crawler capability (no content extraction)

Is there a way forward here for an index based engine? I would much prefer to have search-xxx indexes created and fed by Elasticsearch web crawlers, the only thing they don't offer is the partial crawl api call - and the documentation suggested that an index based engine will provide that.

Thank you in advance

Mark_Hoy · August 12, 2024, 3:43pm

Hi Svetlana -

The web crawler for App Search will only work with App Search managed indices. Unfortunately, direct Elasticsearch indexes will not be able to work with it.

However, have you taken a look at the Elastic Open Web Crawler ? Hopefully this will help with what you need to do. Although there is not an API interface to it, you can control the crawling via the CLI, and you may be able to extend the code to suit your needs.

Svetlana_Chirkova · August 12, 2024, 4:04pm

Hi Mark, Thank you so much for your reply.

Is it on the roadmap to add this API to future elasticrawler versions?
Or will it be discontinued in favour of the open web crawler?

And sorry for tangent, but i see no way of using ingest pipelines with app search managed indexes?

Thank you again for your time and response

Mark_Hoy · August 13, 2024, 1:35pm

Svetlana -

The Open Web Crawler is probably the safer route for future proofing your application, so I would try and use that as there's no guarantee of the App Search crawler adding a feature in to work with non-managed indices.

As for ingest pipelines, not directly - what are you looking to do with ingest pipelines in this context?

Svetlana_Chirkova · August 13, 2024, 4:00pm

Hi Mark, thanks again.

Ingest pipelines - would be to parse metatags into document fields, as the crawler that comes with the App Search managed indices does not offer extraction rules to do same.

system · September 10, 2024, 4:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
API Interface for Elastic Search Index Elasticsearch crawler	3	29	September 19, 2024
Any plans to create an elasticsearch based engine via API? Elastic Search elastic-app-search	4	44	August 22, 2024
Confusion regarding elasticsearch enterprise search and app search Elastic Search elastic-app-search , esre-elasticsearch-relevance-engine	2	1058	October 9, 2023
Web Crawler API Elastic Search crawler	5	240	July 18, 2024
Create an elasticsearch based engine via API Elastic Search elastic-app-search	2	300	September 21, 2023

Index based Engine lacks crawler API capability

Related topics