Question about knn query on nested field and similarity parameter

Hello

According to the documentation (Knn query | Elasticsearch Guide [8.14] | Elastic), the similarity parameter can be used as a filter to only include documents that are greater than the raw similarity calculated. I've been attempting to use this parameter via a knn query on a nested dense_vector field but it does not seem to work. If I provide a value ("similarity": 0.744) nothing is returned but if I tweak the value ("similarity": 0.7439) then everything is returned. In my usage it seems this field returns everything or nothing, even though the _score (I know it is not the raw similarity but a good gauge as to the variance of the raw similarity) is significantly different across returned nested documents.

Wondering if this could be a bug or perhaps there is a way to display the raw similarity. Or perhaps I'm just not understanding the purpose of the parameter correctly.

I'm testing out different embeddings but both fields seem to have the same issue. Here is my embedding mappings:

                    "embedding": {
                      "type": "dense_vector",
                      "dims": 384,
                      "index": true,
                      "similarity": "cosine",
                      "index_options": {
                        "type": "int8_hnsw",
                        "m": 16,
                        "ef_construction": 100
                      }
                    },
                    "embedding_a": {
                      "type": "dense_vector",
                      "dims": 1024,
                      "index": true,
                      "similarity": "cosine",
                      "index_options": {
                        "type": "int8_hnsw",
                        "m": 16,
                        "ef_construction": 100
                      }

I'm using version 8.14.

Any help is appreciated. Thanks.

I think I may be confused to how the similarity parameter is applied as it does reduce the number of top level documents that come back, but does not filter out the nested documents that may be below that score.

Hi there @Shaun_Stuart, welcome to the Elastic community and thanks for posting!

Could you please provide a little more information:

  • Sample mapping that you are sending with the nested field
  • Sample query that you are performing

If you have a small example that is reproducable that is best, but more information will allow us to better understand what's going on and why it's not working as expected.

Thank you!

Hi @Kathleen_DeRusso,

Thanks for the response. I am providing an edited mapping, query and response. Our core data is conversational data and we store it at a transcript, turn (speaker A speaking and then speaker B speaking, etc.) and utterance (i.e. sentence) level. Utterances are nested below turns and we are generating embeddings for each utterance. When I add the similarity parameter and tweak the value, the count of top level documents does change with a higher score reducing the results. But a hit is generated for each nested utterance even if it falls below the similarity score (I believe). I did not include the embeddings for brevity but they were generated from the phrase "flu vaccine". I've limited the inner hit results to 5 for this example but you can see how the _score value from the first hit and the last hit differ significantly. Also, the hits value is the total number of utterances for this transcript, so no utterance is getting filtered out. It seems like the similarity parameter is filtering out the top level document and not the nested utterances. I was expecting (hoping) it would only return the utterances that were above the similarity parameter. I've also tried modifying the score mode, using max and avg but this does not seem to have any effect.

mapping

{
  "mr-sstuart-reveal_1": {
    "aliases": {},
    "mappings": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          },
          "analyzer": "english"
        },
        "transcript:en:content": {
          "type": "text",
          "store": true,
          "fields": {
            "standard": {
              "type": "text",
              "store": true,
              "term_vector": "with_positions_offsets",
              "analyzer": "standard"
            }
          },
          "term_vector": "with_positions_offsets",
          "analyzer": "english"
        },
        "turns": {
          "type": "nested",
          "properties": {
            "en": {
              "type": "nested",
              "properties": {
                "content": {
                  "type": "text",
                  "store": true,
                  "fields": {
                    "standard": {
                      "type": "text",
                      "store": true,
                      "term_vector": "with_positions_offsets",
                      "analyzer": "standard"
                    }
                  },
                  "term_vector": "with_positions_offsets",
                  "analyzer": "english"
                },
                "utterances": {
                  "type": "nested",
                  "properties": {
                    "content": {
                      "type": "text",
                      "store": true,
                      "fields": {
                        "standard": {
                          "type": "text",
                          "store": true,
                          "term_vector": "with_positions_offsets",
                          "analyzer": "standard"
                        }
                      },
                      "term_vector": "with_positions_offsets",
                      "analyzer": "english"
                    },
                    "embedding": {
                      "type": "dense_vector",
                      "dims": 384,
                      "index": true,
                      "similarity": "cosine",
                      "index_options": {
                        "type": "int8_hnsw",
                        "m": 16,
                        "ef_construction": 100
                      }
                    },
                    "embedding_a": {
                      "type": "dense_vector",
                      "dims": 1024,
                      "index": true,
                      "similarity": "cosine",
                      "index_options": {
                        "type": "int8_hnsw",
                        "m": 16,
                        "ef_construction": 100
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

query

GET /mr-sstuart-reveal_1/_search
{
    "_source":
    {
        "includes":
        []
    },
    "from": 0,
    "query":
    {
        "bool":
        {
            "must":
            [
                {
                    "bool":
                    {
                        "must":
                        [
                            {
                                "nested":
                                {
                                    "ignore_unmapped": false,
                                    "inner_hits":
                                    {
                                        "name": "A",
                                        "size": 5,
                                        "_source":
                                        {
                                            "includes":
                                            ["turns.en.utterances.id","turns.en.utterances.content"]
                                        }
                                    },
                                    "path": "turns.en.utterances",
                                    "query":
                                    {
                                        "bool":
                                        {
                                            "must":
                                            [
                                                {
                                                    "bool":
                                                    {
                                                        "must":
                                                        [
                                                            {
                                                                "knn":
                                                                {
                                                                    "field": "turns.en.utterances.embedding",
                                                                    "query_vector": [],
                                                                    "num_candidates": 100,
                                                                    "similarity": 0.75
                                                                }
                                                            }
                                                        ]
                                                    }
                                                }
                                            ]
                                        }
                                    },
                                    "score_mode": "max"
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "size": 20,
    "sort":
    [
        {
            "_score":
            {
                "order": "desc"
            }
        },
        {
            "id":
            {
                "order": "desc"
            }
        }
    ]
}

result

{
  "took": 203,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "mr-sstuart-reveal_1",
        "_id": "2546",
        "_score": 1.1160005,
        "_source": {
        },
        "sort": [
          1.1160005,
          2546
        ],
        "inner_hits": {
          "A": {
            "hits": {
              "total": {
                "value": 172,
                "relation": "eq"
              },
              "max_score": 0.97175026,
              "hits": [
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2546",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 111,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.97175026,
                  "_source": {
                    "id": 496818,
                    "content": "Flu vaccine?"
                  }
                },
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2546",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 105,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.59818417,
                  "_source": {
                    "id": 496822,
                    "content": "So, you need refills for any medicines?"
                  }
                },
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2546",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 90,
                      "_nested": {
                        "field": "utterances",
                        "offset": 1
                      }
                    }
                  },
                  "_score": 0.59635586,
                  "_source": {
                    "id": 496801,
                    "content": "He seems to be a lot better now with, uh, with the medicines."
                  }
                },
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2546",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 0,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.5929342,
                  "_source": {
                    "id": 496669,
                    "content": "Have any medications changed since the last time you were here?"
                  }
                },
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2546",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 25,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.5895265,
                  "_source": {
                    "id": 496711,
                    "content": "Anxiety is better?"
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index": "mr-sstuart-reveal_1",
        "_id": "2545",
        "_score": 1.0795139,
        "_source": {
        },
        "sort": [
          1.0795139,
          2545
        ],
        "inner_hits": {
          "A": {
            "hits": {
              "total": {
                "value": 350,
                "relation": "eq"
              },
              "max_score": 0.90306646,
              "hits": [
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2545",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 192,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.90306646,
                  "_source": {
                    "id": 496559,
                    "content": "Flu shot?"
                  }
                },
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2545",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 187,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.8017596,
                  "_source": {
                    "id": 496560,
                    "content": "No, that, yeah, the flu shot and what's that other sickness?"
                  }
                },
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2545",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 180,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.7032441,
                  "_source": {
                    "id": 496554,
                    "content": "Have you ever gotten the pneumonia shot?"
                  }
                },
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2545",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 109,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.6227246,
                  "_source": {
                    "id": 496453,
                    "content": "I was, I was getting sick when I need, the last couple of winters, not this one, but the last, going back off it, I was getting off it."
                  }
                },
                {
                  "_index": "mr-sstuart-reveal_1",
                  "_id": "2545",
                  "_nested": {
                    "field": "turns",
                    "offset": 0,
                    "_nested": {
                      "field": "en",
                      "offset": 186,
                      "_nested": {
                        "field": "utterances",
                        "offset": 0
                      }
                    }
                  },
                  "_score": 0.62057436,
                  "_source": {
                    "id": 496561,
                    "content": "Shingles?"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Thanks
Shaun

Thanks for the explanation.

I experimented, and I can demonstrate that similarity does work on a small example:

DELETE /test 

PUT /test
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 10
      }
    }
  }
}

POST test/_doc
{
  "my_vector": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
}

POST test/_doc
{
  "my_vector": [23, 3, 2, 5, 54, 4, 2.5, 8, 99, 2545]
}

POST test/_search
{
  "query": {
    "knn": {
      "field": "my_vector",
      "query_vector": [23, 3, 2, 5, 77, 4, 2.5, 8, 99, 2545],
      "num_candidates": 100, 
      "similarity": 0.5
    }
  }
}

However unfortunately, the best option that I can give you is the explain parameter and it does not give the similarity, just that the threshold was met. I'll ask the team and see if they have better suggestions.

Thanks for the follow up. Your example is a bit different as the dense_vector field is not a nested field. But thank you for the response and curious as to what the team may suggest.

OK, so 0.75 would mean we filter on scores of 0.875. Which seems to be working?

I think I may be confused to how the similarity parameter is applied as it does reduce the number of top level documents that come back, but does not filter out the nested documents that may be below that score.

Ah, yes, OK. I think this is a bug. We do not apply the similarity when gathering the inner hits. We do apply it when actually searching and scoring the documents via the vectors. But, gathering inner_hits is done later.

Thanks for the confirmation @BenTrent. And appreciate the prompt responses from the team.