Ingestion pipeline processor error - input field does not exist

Hi I'm working on a RAG project where we use elastic-search to search for relevant documents. The document comes from web crawler. However, due to most LLMs have token limit, I'm trying to chunk the documents into smaller sizes so it fits within the prompt token sizes. Without going directly into the custom coding-heaving method, I found I can set up a custom ingestion pipeline in kibana UI. However, after I setup script + foreach processors, I kept getting this errors.

Processor 'inference' in pipeline 'search-test@custom' failed with message 'Input field [body_content_field] does not exist in the source document

If it's web crawler, it usually has body_content, title etc., but I'm not sure where this "body_content_field" comes from and couldn't really find a way to debug this. Could anyone share some insights? Thanks!
**the method I tried is referred to this doc: Chunking Large Documents via Ingest pipelines plus nested vectors equals easy passage search — Elastic Search Labs

Hi @Joy_yang Welcome to the community and cool building a RAG application. Very cool

There's other good content there at the Elasticsearch labs.

So you need to replace that with the field that's in your source document that has the text in it...

In this line

String[] envSplit = /((?<!M(r|s|rs)\.)(?<=\.) |(?<=\!) |(?<=\?) )/.split(ctx['body_content']);

Replace body_content with your field

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.