How to combine KNN and query_string search

I am trying to combine both:

  • KNN query
  • Query_string query

When I search for a non existing ID with query_string:

POST myindex/_search
{ 
"query": {"query_string": {"default_operator": "AND", "query": "id:\"my_wrong_id\""}}, 
"size": 10
}

I get no result which is correct.

When I combine it with a KNN query:


POST myindex/_search
{"query": {"query_string": {"default_operator": "AND", "query": "id:\"my_wrong_id\""}},
  "knn": {"field": "Phrase_vector", "query_vector": [-0.052947998046875, -0.00864410400390625, -0.02484130859375, -0.042999267578125, -0.00775146484375, -0.043212890625, -0.059478759765625, 0.04034423828125, 0.01229095458984375, 0.033294677734375, -0.00850677490234375, -0.007640838623046875, 0.01214599609375, -0.006191253662109375, -0.0156097412109375, -0.0238494873046875, -0.006107330322265625, 0.029266357421875, -0.01296234130859375, -0.0006251335144042969, -0.034820556640625, -0.0281982421875, -0.017578125, -0.0263519287109375, 0.039031982421875, -0.0142364501953125, -0.021331787109375, -0.01183319091796875, -0.005702972412109375, -0.00821685791015625, -0.052154541015625, -0.013214111328125, -0.033203125, 0.013671875, -0.0013818740844726562, -0.00705718994140625, 0.0125885009765625,
   -0.005588531494140625, -0.006771087646484375, -0.019195556640625, 0.02801513671875, -0.0328369140625, -0.0109100341796875, 0.006252288818359375, -0.06011962890625, -0.0149993896484375, -0.02520751953125, -0.0548095703125, 0.03009033203125, -0.0271148681640625, 0.01617431640625, -0.0111846923828125, -0.007633209228515625, -0.01397705078125, 0.037261962890625, 0.036346435546875, -0.034210205078125, 0.00734710693359375, -0.032012939453125, -0.005397796630859375, 0.06256103515625, 0.0538330078125, -0.01221466064453125, 0.0455322265625, 0.025421142578125, 0.032989501953125, 0.023162841796875, 0.0237274169921875, 0.0286407470703125, -0.05133056640625, 0.00043487548828125], "k": 10, "num_candidates": 100},  
   "size": 10
}

I get results ! There are records matching this vector, but the wrong id should filter them and provide an empty result.

It looks like the query_string is ignored.

I am surprised that this combination is not supported.
Could not find somebody else having the issue.

My Elasticsearch version is 8.11.2

hybrid (by specifying both query and knn) operates as an OR so you will retrieve hits that are from query_string OR KNN.

KNN is a similarity search so even if theres no close match, there will be vectors returned that are contained in the vector space. Therefore your query above will always return hits.

If you want to return similar vector hits that are a particular id, you can add add a pre-filter to the KNN query Knn query | Elasticsearch Guide [8.15] | Elastic

You can also adjust the min similarity to reduce the vector space area of what is deemed similar Knn query | Elasticsearch Guide [8.15] | Elastic

Joe

Bonjour Ivan :wink:

A solution could be to use a bool query with 2 must clauses:

  • the query_string query
  • the knn query

As the query_string won't match, the document will not show up.

Dadoonet, Boolean query looks like a very promising solution, but I could not get it to work:

This query (and many other trials) :

{"query": {
    "bool": {
       "must": [
          {
             "knn": {
                  "field": "Phrase_vector", 
                  "query_vector": [0.001, 0.002, 0.003], 
                   "k": 10, 
                   "num_candidates": 100
               }
           }, 
           {
                "query_string": {
                       "default_operator": "AND", 
                       "query": "(myfield:myvalue)"
                 }
            }
          ]
      }
   }, 
"size": 10
}

Produces:

{
    'error': {
            'root_cause': [{'type': 'x_content_parse_exception', 'reason': '[1:38] 
 [bool] failed to parse field [must]'}], 'type': 'x_content_parse_exception', 'reason': '[1:38] [bool] failed to parse field [must]', 'caused_by': {'type': 'illegal_argument_exception', 'reason': '[knn] queries cannot be provided directly, use the [knn] body parameter instead'}
                }, 
'status': 400
}

In the documentation link provided, right below the example you mentionned, there is this example:

[Knn query | Elasticsearch Guide [8.15] | Elastic](Hybrid search with knn query)

This suggests that the boolean query should be possible ( the example is with a match, but it is in the same family than query_string)

query KNN isn't available in 8.11.2 so will need to update to 8.15 for this.

Its also applied as a post filter when added in a bool match.

  1. pre-filtering – filter is applied during the approximate kNN search to ensure that k matching documents are returned.
  2. post-filtering – filter is applied after the approximate kNN search completes, which results in fewer than k results, even when there are enough matching documents.

I came to the same conclusion during the week end:

Vector search KNN is a work in progress, this I can understand, and I have to upgrade our cluster to 8.15.1.

I upgraded to 8.15.1 on my laptop and got a message further in the controls, but not about boolean and knn combination anymore. So that seems to be the solution.

Thank you both for your help. Please close.