Hi I'm working on a RAG project where we use elastic-search to search for relevant documents. The document comes from web crawler. However, due to most LLMs have token limit, I'm trying to chunk the documents into smaller sizes so it fits within the prompt token sizes. Without going directly into the custom coding-heaving method, I found I can set up a custom ingestion pipeline in kibana UI. However, after I setup script + foreach processors, I kept getting this errors.
Processor 'inference' in pipeline 'search-test@custom' failed with message 'Input field [body_content_field] does not exist in the source document
If it's web crawler, it usually has body_content, title etc., but I'm not sure where this "body_content_field" comes from and couldn't really find a way to debug this. Could anyone share some insights? Thanks!
**the method I tried is referred to this doc: Chunking Large Documents via Ingest pipelines plus nested vectors equals easy passage search — Elastic Search Labs