KNN _score not lining up with similarity filter

Hi Elastic experts, I have the following elastic query, which returns 452 hits.

{
    "explain": true,
    "knn": {
        "field": "derived.models.multilingualE5LargeInstruct",
        "query_vector": [{{1024 element long dense vector here}}],
        "k": 10000,
        "num_candidates": 10000,
        "similarity": 0.85
    },
    "size": 10000,
    "_source": false
}

The document with the lowest score looks like this.

{
    "_shard": "[test_index][7]",
    "_node": "H20TnXpITWqMUtJ-7vZhVw",
    "_index": "test_index",
    "_id": "REDACTED",
    "_score": 0.9250033,
    "_explanation": {
        "value": 0.9250033,
        "description": "within top k documents",
        "details": []
    }
}

To me, this implies that the similarity between this document and the searched vector is 0.925. However, if I modify the above KNN search to use similarity of 0.90, no hits are returned. Can any experts explain why this is happening? Shouldn't the above document exceed the 0.90 similarity threshold and be returned?

The dense vector is defined in the index like so, there are 12 shards and 1 replica.

"multilingualE5LargeInstruct": {
    "properties": {
        "summary": {
            "type": "dense_vector",
            "dims": 1024,
            "index": true,
            "similarity": "cosine",
            "index_options": {
                "type": "int8_hnsw",
                "m": 16,
                "ef_construction": 100
            }
        }
    }
}

Thanks everyone!

Hi @Yakob

First, I was wondering why you are using k=10000 and num_candidates=10000. Could you clarify?

Elasticsearch uses the HNSW (Hierarchical Navigable Small World) algorithm for efficient navigation in the vector space without needing to compare all vectors (num_candidates).

Reducing the search space with HNSW and then finding the k most similar results can improve your outcome and make it more relevant.

@Yakob if you are wanting to filter by similarity take a look at the docs here: k-nearest neighbor (kNN) search | Elasticsearch Guide [8.17] | Elastic

Scores are computed differently based on the type of similarity used. In the default case it will be cosine similarity.

So it should follow this formula:

(2 * _score) - 1

Or in your case:

(2 * 0.95) -1 = 0.8500066

Which I think makes sense based on what you described: that what you are seeing the document show up when you have 0.85 similarity as your threshold. I would expect anything above that and it will not return including 0.90.

Let me know if that doesn't make sense; happy to dig into it further with you.

Thank you so much for your replies Alex and John! I apologize for my delayed response; I was out of town for a few days last week. To answer your questions:

First, I was wondering why you are using k=10000 and num_candidates=10000. Could you clarify?

The index I am working with has 1,500,000 documents and the product desire is to grab as many sufficiently relevant documents as possible. From spot checking, I have found that a similarity of 0.8 seems to be the sweet spot where returned documents are still relevant enough for my app's purpose. Most search vectors will not hit a full 10k results, but it does happen sometimes where 10k relevant results are expected and correct. The app I'm working on runs this KNN query to build a pool of possible final documents, which is then further refined later in the app (I can't use a filtered KNN search because of the issues outlined in my other post here: Efficient Subquery Combinations). As an aside, my team is considering increasing index.max_result_window to 20k as the 10k limit will occasionally keep us from grabbing a few desired documents.

Reading through the docs, use of max_size k and num_candidates does seem to be an anti-pattern. I haven't done a lot of vector math up to this point, so I will research the HNSW algorithm so I can better understand those minutiae.

So it should follow this formula: (2 * _score) - 1
Or in your case: (2 * 0.95) -1 = 0.8500066
Which I think makes sense based on what you described: that what you are seeing the document show up when you have 0.85 similarity as your threshold. I would expect anything above that and it will not return including 0.90.

Yes, you are correct! I had already read the KNN document you linked, but I missed that formula. I ran a few more test queries and all of the responses follow the expected values when using the formula:

_score = (similarity + 1) / 2

Thanks for clearing this up for me! This answers my original question, but I'm happy to learn more about KNN or answer any other questions you two have.

Thanks for clearing this up for me! This answers my original question, but I'm happy to learn more about KNN or answer any other questions you two have.

awesome!

Reading through the docs, use of max_size k and num_candidates does seem to be an anti-pattern. I haven't done a lot of vector math up to this point, so I will research the HNSW algorithm so I can better understand those minutiae.

if it helps at all happy to talk through this in a little more detail; I work on that part of the ES stack.

The app I'm working on runs this KNN query to build a pool of possible final documents, which is then further refined later in the app (I can't use a filtered KNN search because of the issues outlined in my other post here: Efficient Subquery Combinations).

I missed the original post you had created; apologies for that. I took a look briefly and would be curious where you've gotten with that query. We might be able to iterate here a bit and you can always request consulting services too. Kinda sounds like you've tried a good bit of stuff already though. And I'm also not sure about the rrf suggestion but I bet if you reached out to consulting or support they might (actually not sure) be willing to help you out like give you some free cycles to play around with rrf.

for reference pulled from your other post:

  • A' = A
  • B' = B - A
  • C' = C - (A + B)
  • D' = D - (A + B + C)

To get my high level thoughts (without having spent a ton of time thinking about the queries you had in the other post). For what it's worth initially num_candidates will impact the explored HNSW graph which means you're spending a lot more time exploring it. At some point explorations like if you bump up the limit past 10k will start to time out probably. If k and num_candidates are the same what you get back is all the results of the closet 10k in that HNSW graph, k becomes mostly irrelevant. If you really need all of those results back then that's probably the best you can do. If you can do multiple queries it might ultimately be more efficient. You might have tried this already but querying for A and then subsequently querying for B with a list of filter criteria that eliminates all docs from A but with smaller num_candidate lists may yield more efficient results (I'm honestly not sure, be curious to learn if that's the case and if I'm understanding what you are trying to do here). case 2 you mentioned in the other post as well seems like it could be a subsequent metadata only query too. Fun problem space nonetheless. But definitely seems like you'll have to play around here to get to an efficient query.