Retrieve inner_hits when searching multiple kNN fields in same nested document

I've got this use case (examples here are simplified to the essentials) where I want to do a knn search on multiple vectors of the same nested document inside a larger document, and be able to distinguish which of the nested documents was responsible for the hit.
When executing the search request, an error occurs. Seems like the problem occurs when trying to combine both of these inner_hits parts.

Does anyone know a way to make this work, or if this is a bug that is (going to be) solved in newer versions?

Below you can find the error, followed by mappings and query involved.
Currently using elastic v8.11.1

{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "[inner_hits] already contains an entry for key [paragraphs]"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "dfs_query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "aem_pages_nl_blue",
                "node": "AW2Ds3ckQnaYS9B-H34pUw",
                "reason": {
                    "type": "illegal_argument_exception",
                    "reason": "[inner_hits] already contains an entry for key [paragraphs]"
                }
            }
        ],
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "[inner_hits] already contains an entry for key [paragraphs]",
            "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "[inner_hits] already contains an entry for key [paragraphs]"
            }
        }
    },
    "status": 400
}

Required part of template mappings (there's more to the documents in reality):

{
  "mappings": {
    "properties": {
      "websiteSection": {
        "type": "keyword"
      },
      "paragraphs": {
        "type": "nested",
        "properties": {
          "documentText": {
            "type": "text"
          },
          "documentTitle": {
            "type": "text"
          },
          "textEmbedding": {
            "type": "dense_vector",
            "dims": 1536,
            "index": true,
            "similarity": "dot_product"
          },
          "titleEmbedding": {
            "type": "dense_vector",
            "dims": 1536,
            "index": true,
            "similarity": "dot_product"
          }
        }
      }
    }
  }
}

The search query potentially is a hybrid query, but not in all cases. Whether it is or isn't has no effect on the result.

{
  "knn": [
    {
      "field": "paragraphs.titleEmbedding",
      "query_vector": [
        "1536 floating point numbers left out for simplicity"
      ],
      "k": 20,
      "num_candidates": 50,
      "filter": [
        {
          "term": {
            "websiteSection": "forum"
          }
        }
      ],
      "inner_hits": {
        "_source": [
          "paragraphs.documentTitle"
        ],
        "fields": [
          "paragraphs.documentTitle"
        ]
      }
    },
    {
      "field": "paragraphs.textEmbedding",
      "query_vector": [
        "1536 (maybe different from previous) floating point numbers left out for simplicity"
      ],
      "k": 20,
      "num_candidates": 50,
      "filter": [
        {
          "term": {
            "websiteSection": "forum"
          }
        }
      ],
      "inner_hits": {
        "_source": [
          "paragraphs.documentText"
        ],
        "fields": [
          "paragraphs.documentText"
        ]
      }
    }
  ]
}

@Jasper_Simon

This is 100% a bug. I am not sure of the fix yet, but I created this github issue: Failure with `inner_hits` and multiple nested knn clauses · Issue #103792 · elastic/elasticsearch · GitHub

Thank you so much for digging into this and reporting!

@Jasper_Simon actually, I think it might be OK. But I need to add some documentation.

Could you try specifying a unique "name" field for each of your inner_hits objects? Something like:

{
  "knn": [
    {
      "field": "paragraphs.titleEmbedding",
      "query_vector": [
        0.1, 0.2
      ],
      "k": 20,
      "num_candidates": 50,
      "inner_hits": {
		"name": "title",
        "_source": [
          "paragraphs.documentTitle"
        ],
        "fields": [
          "paragraphs.documentTitle"
        ]
      }
    },
    {
      "field": "paragraphs.textEmbedding",
      "query_vector": [
        0.1, 0.2
      ],
      "k": 20,
      "num_candidates": 50,
      "inner_hits": {
		"name": "text",
        "_source": [
          "paragraphs.documentText"
        ],
        "fields": [
          "paragraphs.documentText"
        ]
      }
    }
  ]
}

I think the issue is that we attempt to automatically name the inner_hit to appropriately extract the correct field information given a configuration. However, we have no way of distinguishing these inner_hit configurations.

Hi @BenTrent, thanks a lot for your solution. It is simple yet effective!

Another thing I noticed in the documentation that is currently holding us back, is the limitation to retrieve only the best match vector regardless of the "size" parameter in the inner_hits.

I made a separate post about that in order to not clutter this page.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.