Issue with Pipeline Ordering: Custom Flattening Pipeline Running After ML Inference Pipeline

flalar · February 7, 2025, 1:27pm

Hi everyone,

We are indexing our GitHub repositories to build our AI Assistant Knowledge Base. Our current setup only allows ML inference on text fields located at the document root, yet many key fields—such as review comments and issue comments—are nested within JSON objects.

To address this, we created a custom ingest pipeline called search-corp-github@custom. This pipeline uses a Painless script processor to flatten the document by concatenating all relevant fields (with context labels) into a single field.

However, we’re encountering an issue where the ML inference pipeline (search-corp-github@ml-inference) appears to execute before our custom pipeline. Consequently, the ML inference processor doesn’t see the flattened field because it hasn’t been created when the inference runs.

Is there a way to control or adjust the execution order between these pipelines via the Search Connector UI? Should I chain these pipelines together using a parent pipeline, or is it acceptable (or even recommended) to modify the managed base pipeline that defines the overall pipeline order—even if that generates a warning?

Alternatively, are there other recommended approaches to ensure that the ML inference processor sees all the necessary information in the document?

Any insights or best practices would be greatly appreciated!

Thanks

Jedr_Blaszyk · February 24, 2025, 5:06pm

Hey @flalar It's expected that the @ml-inference sub-pipeline runs before the @custom sub-pipeline, because we wanted to let users be able to post-process the outputs of their embeddings.

If you want to do other pre-processing, they can just manually modify the contents of their @ml-inference sub-pipeline to add more processors at the front of the list, before the inference processors.

Here is supporting documentation: Ingest pipelines in Search | Elasticsearch Guide [8.17] | Elastic

Let me know if you have more questions!

system · March 24, 2025, 5:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to add an inner field as a source field in ML inference pipeline? Elasticsearch elastic-stack-machine-learning , ingest-pipeline	3	481	April 20, 2023
Ingestion Failure with ML inference for E5 model Elasticsearch elastic-stack-machine-learning , painless , ingest-pipeline	3	422	June 4, 2024
ELSER ingest pipeline with ingest processor Elasticsearch ingest-pipeline	1	173	April 24, 2024
Ingest pipeline ELSER embedding fails with more than 1 ML node Elasticsearch elastic-stack-machine-learning , ingest-pipeline	2	224	February 22, 2024
Ingest Pipline - Multiple Pipelines For Elastic Agent Data Elasticsearch ingest-pipeline	1	520	March 29, 2022

Issue with Pipeline Ordering: Custom Flattening Pipeline Running After ML Inference Pipeline

Related topics