Hi everyone,
We are indexing our GitHub repositories to build our AI Assistant Knowledge Base. Our current setup only allows ML inference on text fields located at the document root, yet many key fields—such as review comments and issue comments—are nested within JSON objects.
To address this, we created a custom ingest pipeline called search-corp-github@custom
. This pipeline uses a Painless script processor to flatten the document by concatenating all relevant fields (with context labels) into a single field.
However, we’re encountering an issue where the ML inference pipeline (search-corp-github@ml-inference
) appears to execute before our custom pipeline. Consequently, the ML inference processor doesn’t see the flattened field because it hasn’t been created when the inference runs.
Is there a way to control or adjust the execution order between these pipelines via the Search Connector UI? Should I chain these pipelines together using a parent pipeline, or is it acceptable (or even recommended) to modify the managed base pipeline that defines the overall pipeline order—even if that generates a warning?
Alternatively, are there other recommended approaches to ensure that the ML inference processor sees all the necessary information in the document?
Any insights or best practices would be greatly appreciated!
Thanks