Hi, working on a RAG setup and trying to land on a sensible production architecture for chunk storage and retrieval. Curious what others are running at scale.
Large documents get split into chunks at ingestion, each chunk gets a vector embedding. The parent document has metadata that may change over time. The chunk text and vectors should stay the same after indexing.
We've looked at three approaches:
Flat chunks (each chunk is its own document with a parent_id field): the relationship between chunk and parent exists only on the application side, the engine has no awareness of it at all. So beyond the basic indexing, the application has to manage the full lifecycle: grouping search results by parent, picking the best scoring chunk, extracting the matched text, over-fetching to end up with enough results after deduplication, cleaning up orphan chunks on parent delete, and keeping parent metadata in sync on every chunk. On top of that, any parent field used as a search filter has to be copied onto every chunk document, so changing it means updating potentially hundreds of documents at once.
Nested (chunks as nested objects on the root document): the relationship is managed by the engine, which is the main appeal. Engine handles parent deduplication natively and returns the parent document directly from a chunk-level vector search, no grouping logic needed on our side. Parent-level filters also work without copying fields onto every chunk. What we're less sure about is production behaviour: the docs mention a performance overhead for nested queries compared to flat, and updating any field on the parent rewrites the whole block including all nested chunks. For frequent metadata updates on large documents, is this a real problem in practice or not noticeable?
Parent/Child join: we looked at this briefly and dropped it. The docs explicitly say has_child/has_parent queries add significant overhead, and there are threads here with 12+ second query times even on small datasets.
So the question is: for this kind of chunk storage setup, is nested the standard approach now? From documentations perspective all seem to push in that direction. Or is the nested query overhead actually noticeable in production and teams prefer to deal with the additional logic on the application side?