Issue with Pipeline Ordering: Custom Flattening Pipeline Running After ML Inference Pipeline

Hi everyone,

We are indexing our GitHub repositories to build our AI Assistant Knowledge Base. Our current setup only allows ML inference on text fields located at the document root, yet many key fields—such as review comments and issue comments—are nested within JSON objects.

To address this, we created a custom ingest pipeline called search-corp-github@custom. This pipeline uses a Painless script processor to flatten the document by concatenating all relevant fields (with context labels) into a single field.

However, we’re encountering an issue where the ML inference pipeline (search-corp-github@ml-inference) appears to execute before our custom pipeline. Consequently, the ML inference processor doesn’t see the flattened field because it hasn’t been created when the inference runs.

Is there a way to control or adjust the execution order between these pipelines via the Search Connector UI? Should I chain these pipelines together using a parent pipeline, or is it acceptable (or even recommended) to modify the managed base pipeline that defines the overall pipeline order—even if that generates a warning?

Alternatively, are there other recommended approaches to ensure that the ML inference processor sees all the necessary information in the document?

Any insights or best practices would be greatly appreciated!

Thanks

Hey @flalar It's expected that the @ml-inference sub-pipeline runs before the @custom sub-pipeline, because we wanted to let users be able to post-process the outputs of their embeddings.

If you want to do other pre-processing, they can just manually modify the contents of their @ml-inference sub-pipeline to add more processors at the front of the list, before the inference processors.

Here is supporting documentation: Ingest pipelines in Search | Elasticsearch Guide [8.17] | Elastic

Let me know if you have more questions!