Elasticsearch is returning less than the top K matches for a vector search

I have a problem where elasticsearch doesn't return k matches for a knn search. It used to work before so I think something has changed between version 8.8.3 to 8.10.3. Perhaps a minimum score? I could not find it in the docs though.

Its the following case:

I have an index where the textEmbedding field has the following mapping:

"textEmbedding": {
  "type": "dense_vector",
   "dims": 1536,
   "index": true,
   "similarity": "cosine"
 }

The index contains two documents.

Document 1:

{
  "_index": "test_index_c64dcf58",
  "_id": "author1",
  "_source": {
    "id": "author1",
    // .....
    "textEmbedding": [
      -0.888888,
      -0.888888,
      // and so on... (they all are the same NEGATIVE number: -0.888888)
    ]
  }
}

Document 2:

{
  "_index": "test_index_c64dcf58",
  "_id": "author2",
  "_source": {
    "id": "author2",
    // .....
    "textEmbedding": [
      0.888888,
      0.888888,
      // and so on... (they all are the same POSITIVE number: 0.888888)
    ]
  }
}

For the following search query I would expect to get both matches. Since there are only two documents, both of them should be among the k=10 best matches. However, it only returns 1 of the documents, namely the one with the negative vector values. The match that it does return makes sense because it is closest to the query vector.

GET /test_index_c64dcf58/_search
{
  "size": 10,
  "from": 0,
  "knn": {
    "field": "textEmbedding",
    "k": 10,
    "num_candidates": 100,
    "query_vector": [
      -0.777777,
      -0.777777,
      ...and so on...
    ],
    "filter": {
      "bool": {
        "must": [],
        "must_not": []
      }
    }
  },
  "aggs": {}
}

It seems to be related to the empty filter clause. Once I remove the filter part of the query it works as expected and it returns both documents. This behaviour must have changed between version 8.8.3 and 8.10.3.

After removing the empty filter clause the second document is returned but gets a score of: -1.7881393e-7. Shouldn't document scores be positive?

@sbruinsje this does seem like a bug. Since it can reliably reproduce, could you open an issue here: Issues · elastic/elasticsearch · GitHub

Returning a negative score seems like a Lucene bug, we shouldn't be doing that. But I bet its a floating point math error as -0.0000001 is really close to zero...

1 Like

The issue has been created here: Elasticsearch returning less than k results for a knn search and returns negative document scores · Issue #100975 · elastic/elasticsearch · GitHub

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.