Using two levels of nesting in knn vector search

If we have an index with at least two nesting levels (i know it is not optimal in ES, but we need to have it) , and the dense vectors are in the second level. I can get from the results, the top level fields (they are unique by hit) or the low level ones from the inner hits. But I can't find a way to retrieve the fields of the middle level.
Let me set up an example,
This is the mapping:

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "chapters": {
        "type": "nested",
        "properties": {
          "number": {
            "type": "integer"
          },
          "c_vector": {
            "type": "dense_vector",
            "dims": 3,
            "similarity": "cosine"
          },
          "paragraphs": {
            "type": "nested",
            "properties": {
              "content": {
                "type": "text"
              },
              "p_vector": {
                "type": "dense_vector",
                "dims": 3,
                "similarity": "cosine"
              }
            }
          }
        }
      }
    }
  }
}

So, we have a book with chapters and paragraphs, chapters also have dense vectors, but they are not strictly needed for this topic, but will help to illustrate it.

So, searching for paragraphs, I can do this query:

{
  "knn": {
    "query_vector": [1,2,3],
    "field": "chapters.c_vector",
    "k": 3,
    "num_candidates": 3,
    "inner_hits": {
      "_source": false,
      "fields": [  "chapters.number"   ]
    }
  },
  "fields": ["title","chapters.number"]
  "_source": false
}

In the results,

  • from the HIT, i get the title and the the list of chapters,
  • From the inner hit , i get the top matching chapter
    ok, that's fine

But now, If i want to search paragraphs, the query would be

{
    "knn": {
        "query_vector": [0,2,2],
        "field": "chapters.paragraphs.p_vector",
        "k": 3,
        "num_candidates": 3,
        "inner_hits": {
            "_source": false,
            "fields": ["chapters.paragraphs.content","chapters.number"]
        }
    },
   "fields": ["title","chapters.number"],
    "_source": false
}

So, I can get the title, and the in inner hits I get the paragraph with highest score, but... How can I know to which chapter this paragraph belongs?
in the outer fields, i get all the chapters of the book,
Adding "chapters.number" in the knn -> inner_hits has no effect.

In regular searches , an nested search can be embedded into another nested search, and each may return it's own inner_hits, but I don't think this is possible in knn searches.

Summarizing: When there are multiple levels on nesting, and the knn search is done in the lower levels, how can the middle levels of a hit be obtained?

hey @jcodina !

This is an interesting one - inner_hits retrieve the inner hits for just one nested level. Having multiple nested levels involve a different approach, using the knn query instead of the top-level knn section:

{
    "query": {
        "nested": {
            "path": "chapters",
            "query": {
                "nested": {
                    "path": "chapters.paragraphs",
                    "query": {
                        "knn": {
                            "query_vector": [
                                0,
                                2,
                                2
                            ],
                            "field": "chapters.paragraphs.p_vector",
                            "k": 3,
                            "num_candidates": 3
                        }
                    },            
                    "inner_hits": {
                        "fields": ["chapters.paragraphs.content"],
                        "_source": false
                    }
                }
            },            
            "inner_hits": {
                "fields": ["chapters.number"],
                "_source": false
            }
        }
    },
    "_source": false,
    "fields": ["title"]
}

Check that we're using two nested queries, each one retrieving the corresponding inner_hits for each path section.

It is a bit complex to traverse the result, as you'll be dealing with nested inner hits. It might be a good idea to flatten this structure and include the chapter number into the paragraphs information.

Hope that helps!