Why would the default Ingest Pipeline in version 8.4 not run?
There's not too much information about the conditions to which the pipeline would run, it just says that every document that is found by the crawler should go through this pipeline..
Could you tell me more information about your setup? From what i've read, you have an app search engine with a web crawler configured to your domain. The app search crawler has setup a default ingest pipeline primary focused on documents, to deal with binary content extraction from incoming documents and run when there is a presence of an _attachment field.
Right now you're seeing no activity within your pipeline and questioning when does it run? I believe the pipeline should run always but the processors in pipeline will execute based on their own set of conditions.
Have you invoked a crawl yet through the UI for your domain?
Hey there, the app_search_crawler in 8.4 really only is used for binary content that was extracted from the App Search crawler. If you're looking to run all of the crawled documents you'll want to use the new Elastic web crawler introduced 8.4.0 release notes | Elastic Enterprise Search documentation [8.4] | Elastic that has a different pipeline defined.
In the release notes you can reference the section:
" * App Search web crawler support for binary content extraction is now generally available. Crawl binary content such as PDF and Office documents. Note that only binary content (not HTML) is sent into the app_search_crawler pipeline, whereas the Elastic web crawler sends all indexed documents into the ent_search_crawler pipeline. See the breaking changes section if you have modified the App Search web crawler ingest pipeline settings."
Ah I missed that part - I understand why it isn't working, specifically around Non-Binary triggers!
When looking at the stats, that particular crawler also isn't running.
Does this crawler also work for App Search's Web Crawl, or just enterprise search?
GET _nodes/stats/ingest?filter_path=nodes.*.ingest
This works for the Web Crawler that you can access and use in 8.4 (Top level navigation -> Content -> Indices -> Create Index) for Enterprise Search. You can then connect that web crawler to App Search capabilities by using Elasticsearch index engines (beta) | Elastic App Search Documentation [8.4] | Elastic to the created index. I'm super interested in your feedback on how these different crawlers are working for you - so if you'd like to do a quick user feedback session please let me know!
@moassafiri have you tried using any of the new Web crawler capabilities? In 8.5, we added defined managed pipelines that could be customized and more easily used for testing inference pipelines. Would you be interested in giving us feedback on whether you may consider using these newly introduced capabilities?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.