Ingest Pipeline "app_search_crawler" not running

moassafiri · October 18, 2022, 1:57am

Hello!

Why would the default Ingest Pipeline in version 8.4 not run?
There's not too much information about the conditions to which the pipeline would run, it just says that every document that is found by the crawler should go through this pipeline..

"app_search_crawler": {
            "count": 0,
            "time_in_millis": 0,
            "current": 0,
            "failed": 0,
            "processors": [

joemcelroy · October 18, 2022, 8:24am

hey @moassafiri!

Could you tell me more information about your setup? From what i've read, you have an app search engine with a web crawler configured to your domain. The app search crawler has setup a default ingest pipeline primary focused on documents, to deal with binary content extraction from incoming documents and run when there is a presence of an _attachment field.

Right now you're seeing no activity within your pipeline and questioning when does it run? I believe the pipeline should run always but the processors in pipeline will execute based on their own set of conditions.

Have you invoked a crawl yet through the UI for your domain?

Joe

Serena_Chou · October 18, 2022, 9:01pm

Hey there, the app_search_crawler in 8.4 really only is used for binary content that was extracted from the App Search crawler. If you're looking to run all of the crawled documents you'll want to use the new Elastic web crawler introduced 8.4.0 release notes | Elastic Enterprise Search documentation [8.4] | Elastic that has a different pipeline defined.

In the release notes you can reference the section:
" * App Search web crawler support for binary content extraction is now generally available. Crawl binary content such as PDF and Office documents. Note that only binary content (not HTML) is sent into the app_search_crawler pipeline, whereas the Elastic web crawler sends all indexed documents into the ent_search_crawler pipeline. See the breaking changes section if you have modified the App Search web crawler ingest pipeline settings."

moassafiri · October 18, 2022, 10:32pm

Hi, thanks for your help.

Ah I missed that part - I understand why it isn't working, specifically around Non-Binary triggers!

When looking at the stats, that particular crawler also isn't running.
Does this crawler also work for App Search's Web Crawl, or just enterprise search?

GET _nodes/stats/ingest?filter_path=nodes.*.ingest

"ent_search_crawler": {
            "count": 0,
            "time_in_millis": 0,
            "current": 0,
            "failed": 0,

Serena_Chou · October 19, 2022, 3:49pm

This works for the Web Crawler that you can access and use in 8.4 (Top level navigation -> Content -> Indices -> Create Index) for Enterprise Search. You can then connect that web crawler to App Search capabilities by using Elasticsearch index engines (beta) | Elastic App Search Documentation [8.4] | Elastic to the created index. I'm super interested in your feedback on how these different crawlers are working for you - so if you'd like to do a quick user feedback session please let me know!

moassafiri · October 28, 2022, 2:39am

Hi Serena,

We ended up apply a Final Pipeline onto the indexes and that worked for us.

Thanks.

system · November 25, 2022, 2:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Serena_Chou · January 17, 2023, 9:12pm

@moassafiri have you tried using any of the new Web crawler capabilities? In 8.5, we added defined managed pipelines that could be customized and more easily used for testing inference pipelines. Would you be interested in giving us feedback on whether you may consider using these newly introduced capabilities?

Topic		Replies	Views
Ingest pipeline for App Search document indexing Elastic Search ingest-pipeline	2	712	August 19, 2022
How can i update the pipeline used for a app search engine? Elastic Search elastic-app-search	5	219	April 11, 2024
How can i disable content extraction? Elastic Search elastic-app-search	2	184	April 11, 2024
Why a custom pipeline doesn't run automatically on a Crawler index? Elasticsearch ingest-pipeline	3	163	May 19, 2024
Creating an Ingest Pipeline Elasticsearch	5	606	August 30, 2018

Ingest Pipeline "app_search_crawler" not running

Related topics