Hi Elastic experts, I have the following elastic query, which returns 452 hits.
{
"explain": true,
"knn": {
"field": "derived.models.multilingualE5LargeInstruct",
"query_vector": [{{1024 element long dense vector here}}],
"k": 10000,
"num_candidates": 10000,
"similarity": 0.85
},
"size": 10000,
"_source": false
}
The document with the lowest score looks like this.
{
"_shard": "[test_index][7]",
"_node": "H20TnXpITWqMUtJ-7vZhVw",
"_index": "test_index",
"_id": "REDACTED",
"_score": 0.9250033,
"_explanation": {
"value": 0.9250033,
"description": "within top k documents",
"details": []
}
}
To me, this implies that the similarity between this document and the searched vector is 0.925. However, if I modify the above KNN search to use similarity of 0.90
, no hits are returned. Can any experts explain why this is happening? Shouldn't the above document exceed the 0.90 similarity threshold and be returned?
The dense vector is defined in the index like so, there are 12 shards and 1 replica.
"multilingualE5LargeInstruct": {
"properties": {
"summary": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
}
}
}
Thanks everyone!