KNN search returns an empty result set when num_candidates is less than the filtered doc count

I have a problem with my knn query. I am applying a knn search with a filter clause (pre-filter). This filter contains a range filter that filters on a date field in combination with some term or wildcard queries on keyword/wildcard fields. All of the filters return result sets with documents that have the dense_vector field populated. When I apply an approximate knn query with said filter clause, my cluster returns an empty set of results when num_candidates is less than the number of documents that match the given filter. If I increase num_candidates by 1 to be equal to the filtered document count, I get k results back as expected for any size of k that I choose. According to the documentation, if the num_candidates value is greater than or equal to the filtered document count, the search bypasses the HNSW graph and uses brute force search on the filtered documents. The behavior that I am experiencing suggests that the brute force search is working but approximate knn using the HNSW graph is failing for some reason. This behavior does not exist for all filters. Some filters return results as expected, while others do not. I cannot see a pattern that links the filters that succeed to one another. It seems almost random.

The profiles of the two queries can be found below. The query that returns the expected results uses a “DocAndScoreQuery” in the knn section and a “ConstantScoreQuery” in the searches section (with a “KnnScoreDocQuery” in its children list). The query that fails uses a “MatchNoDocsQuery” in both of these sections. It seems as though Elasticsearch has decided, using some metric, that we will get no results and is therefore returning nothing. I have no explanation for why it would make this decision.

Cluster information:

  • Version: 8.7.1
  • Nodes: 1

Index information

  • Shards: 1
  • Doc count: 458067 (4 095 of these do not have the required vector field for semantic search)
  • Index size: 7.48GB

Mapping information:

The knn search is performed on a dense_vector field with the following properties:

  • Dimension: 128
  • Similarity: dot_product
  • Excluded from source

I am currently unable to reproduce the bug anywhere except in this cluster. I have other clusters with the same ES version and mappings but different numbers of nodes and shards. I cannot get knn to fail in this way on another cluster.

Does anybody have any theories as to what might be causing this? Any theories or suggestions will be greatly appreciated.

Profile for unsuccessful query:

{
        "id": "[A6OyFGpQRk-eOjezM8BZEQ][srhw-sms-2024-10-07][0]",
        "dfs": {
          "statistics": {
            "type": "statistics",
            "description": "collect term statistics",
            "time_in_nanos": 5452,
            "breakdown": {
              "term_statistics": 0,
              "collection_statistics": 0,
              "collection_statistics_count": 0,
              "create_weight": 3835,
              "term_statistics_count": 0,
              "rewrite_count": 0,
              "create_weight_count": 1,
              "rewrite": 0
            }
          },
          "knn": [
            {
              "query": [
                {
                  "type": "MatchNoDocsQuery",
                  "description": """MatchNoDocsQuery("")""",
                  "time_in_nanos": 813,
                  "breakdown": {
                    "set_min_competitive_score_count": 0,
                    "match_count": 0,
                    "shallow_advance_count": 0,
                    "set_min_competitive_score": 0,
                    "next_doc": 0,
                    "match": 0,
                    "next_doc_count": 0,
                    "score_count": 0,
                    "compute_max_score_count": 0,
                    "compute_max_score": 0,
                    "advance": 0,
                    "advance_count": 0,
                    "count_weight_count": 0,
                    "score": 0,
                    "build_scorer_count": 16,
                    "create_weight": 226,
                    "shallow_advance": 0,
                    "count_weight": 0,
                    "create_weight_count": 1,
                    "build_scorer": 587
                  }
                }
              ],
              "rewrite_time": 2345906,
              "collector": [
                {
                  "name": "SimpleTopScoreDocCollector",
                  "reason": "search_top_hits",
                  "time_in_nanos": 5420
                }
              ]
            }
          ]
        },
        "searches": [
          {
            "query": [
              {
                "type": "MatchNoDocsQuery",
                "description": """MatchNoDocsQuery("User requested "match_none" query.")""",
                "time_in_nanos": 673,
                "breakdown": {
                  "set_min_competitive_score_count": 0,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 0,
                  "next_doc": 0,
                  "match": 0,
                  "next_doc_count": 0,
                  "score_count": 0,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 0,
                  "advance_count": 0,
                  "count_weight_count": 0,
                  "score": 0,
                  "build_scorer_count": 16,
                  "create_weight": 301,
                  "shallow_advance": 0,
                  "count_weight": 0,
                  "create_weight_count": 1,
                  "build_scorer": 372
                }
              }
            ],
            "rewrite_time": 278,
            "collector": [
              {
                "name": "TotalHitCountCollector",
                "reason": "search_count",
                "time_in_nanos": 562
              }
            ]
          }

Profile of successful query:

 {
        "id": "[A6OyFGpQRk-eOjezM8BZEQ][srhw-sms-2024-10-07][0]",
        "dfs": {
          "statistics": {
            "type": "statistics",
            "description": "collect term statistics",
            "time_in_nanos": 3637,
            "breakdown": {
              "term_statistics": 0,
              "collection_statistics": 0,
              "collection_statistics_count": 0,
              "create_weight": 2396,
              "term_statistics_count": 0,
              "rewrite_count": 0,
              "create_weight_count": 1,
              "rewrite": 0
            }
          },
          "knn": [
            {
              "query": [
                {
                  "type": "DocAndScoreQuery",
                  "description": "DocAndScore[242]",
                  "time_in_nanos": 93470,
                  "breakdown": {
                    "set_min_competitive_score_count": 0,
                    "match_count": 0,
                    "shallow_advance_count": 0,
                    "set_min_competitive_score": 0,
                    "next_doc": 7871,
                    "match": 0,
                    "next_doc_count": 242,
                    "score_count": 242,
                    "compute_max_score_count": 0,
                    "compute_max_score": 0,
                    "advance": 7277,
                    "advance_count": 22,
                    "count_weight_count": 0,
                    "score": 23583,
                    "build_scorer_count": 44,
                    "create_weight": 29494,
                    "shallow_advance": 0,
                    "count_weight": 0,
                    "create_weight_count": 1,
                    "build_scorer": 25245
                  }
                }
              ],
              "rewrite_time": 4218315,
              "collector": [
                {
                  "name": "SimpleTopScoreDocCollector",
                  "reason": "search_top_hits",
                  "time_in_nanos": 60580
                }
              ]
            }
          ]
        },
        "searches": [
          {
            "query": [
              {
                "type": "ConstantScoreQuery",
                "description": "ConstantScore(ScoreAndDocQuery)",
                "time_in_nanos": 96408,
                "breakdown": {
                  "set_min_competitive_score_count": 0,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 0,
                  "next_doc": 451,
                  "match": 0,
                  "next_doc_count": 2,
                  "score_count": 0,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 9877,
                  "advance_count": 22,
                  "count_weight_count": 0,
                  "score": 0,
                  "build_scorer_count": 44,
                  "create_weight": 58238,
                  "shallow_advance": 0,
                  "count_weight": 0,
                  "create_weight_count": 1,
                  "build_scorer": 27842
                },
                "children": [
                  {
                    "type": "KnnScoreDocQuery",
                    "description": "ScoreAndDocQuery",
                    "time_in_nanos": 36738,
                    "breakdown": {
                      "set_min_competitive_score_count": 0,
                      "match_count": 0,
                      "shallow_advance_count": 0,
                      "set_min_competitive_score": 0,
                      "next_doc": 184,
                      "match": 0,
                      "next_doc_count": 2,
                      "score_count": 0,
                      "compute_max_score_count": 0,
                      "compute_max_score": 0,
                      "advance": 8740,
                      "advance_count": 22,
                      "count_weight_count": 0,
                      "score": 0,
                      "build_scorer_count": 44,
                      "create_weight": 11966,
                      "shallow_advance": 0,
                      "count_weight": 0,
                      "create_weight_count": 1,
                      "build_scorer": 15848
                    }
                  }
                ]
              }
            ],
            "rewrite_time": 9404,
            "collector": [
              {
                "name": "TotalHitCountCollector",
                "reason": "search_count",
                "time_in_nanos": 3273
              }
            ]
          }

Hi @Jamie123 !

As you mention, the profile of the unsuccessful query shows that it's doing a MatchNoDocsQuery, meaning that there are no possible results for the query.

Can you please share the query you're using?

Hi @Carlos_D
Thank you for your response!

The query that I am using looks like this:

{
  "knn": [
    {
      "field": "vector_field",
      "query_vector": [
0.10405463725328445,
            0.06872836500406265,
            -0.03154413402080536,
            0.05876253917813301,
            0.00010786696657305583,
            -0.03682033345103264,
            -0.02495177648961544,
            -0.019133174791932106,
            0.09605824947357178,
            -0.0750531554222107,
            -0.210474893450737,
            -0.12536780536174774,
            0.04762619733810425,
            -0.029159026220440865,
            0.0062823728658258915,
            0.05450090393424034,
            -0.09959766268730164,
            0.04474235326051712,
            -0.08019277453422546,
            0.1169605404138565,
            0.046298615634441376,
            -0.11489452421665192,
            -0.006372060161083937,
            0.010990869253873825,
            0.04755079001188278,
            0.10116023570299149,
            0.000918519392143935,
            0.0027101014275103807,
            -0.1502690613269806,
            -0.14312244951725006,
            0.1215350404381752,
            0.007427239790558815,
            0.03728736564517021,
            0.1368429958820343,
            -0.11339599639177322,
            -0.11459450423717499,
            0.06264454126358032,
            0.04414265230298042,
            0.012543505989015102,
            0.02852642349898815,
            -0.12434303760528564,
            0.03353649750351906,
            0.03726150095462799,
            0.07234278321266174,
            -0.1345919668674469,
            -0.09530984610319138,
            0.1395033299922943,
            0.10010628402233124,
            -0.10837505757808685,
            0.10268478840589523,
            -0.06319449096918106,
            0.1211763396859169,
            0.03178740292787552,
            -0.01597677357494831,
            -0.06661062687635422,
            0.10101081430912018,
            0.10408773273229599,
            0.010791797190904617,
            0.039536766707897186,
            0.07304368168115616,
            -0.05732041224837303,
            0.20468732714653015,
            0.16652728617191315,
            0.07594912499189377,
            -0.013228477910161018,
            0.12920239567756653,
            -0.11352842301130295,
            -0.08272847533226013,
            -0.017150182276964188,
            -0.07550862431526184,
            -0.22037598490715027,
            0.14705835282802582,
            0.22986359894275665,
            0.00656925467774272,
            0.1398448497056961,
            -0.030111905187368393,
            0.01367101538926363,
            -0.08472618460655212,
            -0.0376223623752594,
            0.05935221537947655,
            0.03775160759687424,
            -0.08872424066066742,
            0.013909764587879181,
            0.08566707372665405,
            -0.044493574649095535,
            0.023887664079666138,
            -0.1446654200553894,
            -0.024344027042388916,
            0.16167685389518738,
            -0.03262116387486458,
            -0.12575432658195496,
            -0.0172551479190588,
            0.007975244894623756,
            -0.022601231932640076,
            -0.06843382120132446,
            0.08958141505718231,
            -0.016171058639883995,
            -0.04362731799483299,
            0.004653268028050661,
            -0.046542149037122726,
            0.0013423251220956445,
            -0.1443037986755371,
            0.0247117318212986,
            0.0070696319453418255,
            -0.008625822141766548,
            0.1437695175409317,
            0.029397638514637947,
            -0.06349257379770279,
            -0.004613952711224556,
            -0.10620557516813278,
            0.04382902756333351,
            -0.08233462274074554,
            0.09582098573446274,
            -0.15342384576797485,
            -0.04546114802360535,
            0.013855633325874805,
            -0.05375361070036888,
            0.11182045936584473,
            -0.04343185946345329,
            -0.019864404574036598,
            -0.026637574657797813,
            0.02133280225098133,
            0.08539711683988571,
            -0.014102846384048462,
            0.14996066689491272,
            -0.06198882311582565,
            -0.17123034596443176,
            -0.09001388400793076
      ],
      "k": 2,
      "num_candidates": 157,
      "filter": [
        {
          "bool": {
            "must": [
              {
                "range": {
                  "sale_time_field": {
                    "gte": 1727311473000,
                    "lt": 1727313491000
                  }
                }
              },
              {
                    "bool": {
                      "minimum_should_match": "1",
                      "should": [
                        {
                          "wildcard": {
                            "product_code": {
                              "case_insensitive": true,
                              "wildcard": "*12"
                            }
                          }
                        }
                      ]
                    }
                  }
            ]
          }
        }
      ]
    }
  ],
  "size":2,
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "tie_breaker_id": {
        "order": "desc"
      }
    }
  ],
  "track_total_hits": true
}

As soon as num_candidates is less than 157, I get the empty result set with the MatchNoDocsQuery.
The query below returns 157 results.

{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "sale_time_field": {
              "gte": 1727311473000,
              "lt": 1727313491000
            }
          }
        },
        {
          "exists": {
            "field": "vector_field"
          }
        },
        {
          "bool": {
            "minimum_should_match": "1",
            "should": [
              {
                "wildcard": {
                  "product_code": {
                    "case_insensitive": true,
                    "wildcard": "*12"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Thanks @Jamie123 !

Can you please execute the result of the Validate API using rewrite and all_shards parameters, for both queries (knn and the direct bool query)?

GET my-index-000001/_validate/query?rewrite=true&all_shards=true

How many shards does your index has? Can you provide the output for GET _cat/shards/ for your index?

Also, what ES version are you running?

Hi @Carlos_D

Here are the answers to your questions.

Validate API result for direct bool query:

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "my_index",
      "shard": 0,
      "valid": true,
      "explanation": """+sale_time_field:[1727311473000 TO 1727313490999] +ConstantScore(FieldExistsQuery [field=vector_field]) +product_code:AutomatonQuery {
org.apache.lucene.util.automaton.Automaton@49029149}"""
    }
  ]
}

Validate API result for knn query (for both the case where is gives results and doesn't):

{
  "valid": false
}

I ran the validate API method on the knn method in my other ES cluster where I never have this issue and got the same

{
  "valid": false
}

response.

The line for my_index in the cat/shards/ response looks as follows:

my_index                                           0 p STARTED     4580676    7.4gb 10.47.1.13 minion12

I am using Elasticsearch version 8.7.1.

Thank you again for your assistance.

Hi @Jamie123 :

I've checked with the team, and this seems like a bug. knn should return values.

Can you please open an issue in our GH repo?

It would be super useful to have a way of reproducing this. If you can spend some time providing a minimal dataset and mapping that reproduces the error, it would help the bug to be resolved.

As a workaround, you can use exact nearest neighbours via script_score. When having few documents that match a filter, it will be faster than using knn (check this blog for some details on that).

Hope that helps!

Hi @Carlos

Supplying a set of steps/data to reproduce this may be difficult since my attempts to do so on different ES instances (including local docker) have failed. It also doesn't happen for every knn filter.

I suspect the cause of this behaviour isn't an Elasticsearch bug per se but is rather index segment related. I've been focusing on one specific index where this behaviour is occurring the most frequently and noticed that the index segment details show that only 26/27 segments have been compounded. Note this index is also no longer seeing new data.
My understanding is that an HNSW graph is defined per index segment. Is it possible that a non-compounded segment explain why approximate knn search fails for certain filter clauses on this index? Could it possibly hint to a segment failure somewhere?

Running GET _cat/segments shows the following:

index                 shard prirep segment generation docs.count docs.deleted     size searchable compound
my-index-2024-09-26   0     p      _74            256     163416         7979  233.2mb true       true
my-index-2024-09-26   0     p      _95            329    2783466        18809    3.6gb true       false
my-index-2024-09-26   0     p      _9f            339     191334        94229  531.9mb true       true
my-index-2024-09-26   0     p      _ai            378     169536        46764  305.1mb true       true
my-index-2024-09-26   0     p      _ba            406     339918        44655  522.2mb true       true
my-index-2024-09-26   0     p      _c7            439      91094        36164  238.5mb true       true
my-index-2024-09-26   0     p      _cj            451     148760        60199    406mb true       true
my-index-2024-09-26   0     p      _cu            462       3532          563   11.4mb true       true
my-index-2024-09-26   0     p      _cv            463       3427          527     11mb true       true
my-index-2024-09-26   0     p      _cx            465       3497          558   11.3mb true       true
my-index-2024-09-26   0     p      _d2            470      20412         6014   72.8mb true       true
my-index-2024-09-26   0     p      _d7            475     335292        54050  600.6mb true       true
my-index-2024-09-26   0     p      _dd            481      63669         6729  138.9mb true       true
my-index-2024-09-26   0     p      _df            483      15930         8484   70.2mb true       true
my-index-2024-09-26   0     p      _dg            484      15868         8473     70mb true       true
my-index-2024-09-26   0     p      _dn            491       2967           18    8.9mb true       true
my-index-2024-09-26   0     p      _dq            494     110603         5123  186.3mb true       true
my-index-2024-09-26   0     p      _dr            495         12            0   48.1kb true       true
my-index-2024-09-26   0     p      _ds            496         15            0   57.1kb true       true
my-index-2024-09-26   0     p      _dt            497      77182        78837  381.5mb true       true
my-index-2024-09-26   0     p      _du            498       6666         3814     27mb true       true
my-index-2024-09-26   0     p      _dw            500      14251        14375   70.8mb true       true
my-index-2024-09-26   0     p      _dy            502       6244         6022   32.8mb true       true
my-index-2024-09-26   0     p      _dz            503      13582            0   21.9mb true       true
my-index-2024-09-26   0     p      _e0            504          1            0   18.9kb true       true
my-index-2024-09-26   0     p      _e1            505          1            0   18.9kb true       true
my-index-2024-09-26   0     p      _e2            506          1            0   18.9kb true       true

To test, I manually reindexed (not using the reindex API) the same data set into a separate index on the same cluster. I then tested the above failing knn query, which now returns results as expected. This seems to suggest that something had gone wrong within the internals of the index, causing the knn to fail.

Comparing the details of the older (failing) and newer (successful) version of the same index:

index                            doc_count        size        num_of_segments        
my-index-2024-09-26                4580676            7.48gb        27
duplicate-my-index-2024-09-26    4580676            5.4gb        28

It would appear that the duplicated index also has one non-compounded segment but the above knn search still works on it. The results of GET _cat/segments shows the following:

index                             shard prirep segment generation docs.count docs.deleted     size searchable compound
duplicate-my-index-2024-09-26   0     p      _1v             67      15146            0   23.1mb true       true
duplicate-my-index-2024-09-26   0     p      _2w            104      18816            0   28.5mb true       true
duplicate-my-index-2024-09-26   0     p      _5h            197     383605            0  427.6mb true       true
duplicate-my-index-2024-09-26   0     p      _5y            214      10506            0   24.2mb true       true
duplicate-my-index-2024-09-26   0     p      _6x            249        828            0    1.2mb true       true
duplicate-my-index-2024-09-26   0     p      _73            255        979            0    3.1mb true       true
duplicate-my-index-2024-09-26   0     p      _74            256        580            0    1.9mb true       true
duplicate-my-index-2024-09-26   0     p      _76            258      27797            0   34.3mb true       true
duplicate-my-index-2024-09-26   0     p      _7b            263       1759            0    5.8mb true       true
duplicate-my-index-2024-09-26   0     p      _7d            265       1841            0      6mb true       true
duplicate-my-index-2024-09-26   0     p      _7k            272       2272            0    5.6mb true       true
duplicate-my-index-2024-09-26   0     p      _7m            274        321            0  926.2kb true       true
duplicate-my-index-2024-09-26   0     p      _7n            275      36950            0  109.7mb true       true
duplicate-my-index-2024-09-26   0     p      _7o            276      36913            0  108.3mb true       true
duplicate-my-index-2024-09-26   0     p      _7s            280       5563            0   18.6mb true       true
duplicate-my-index-2024-09-26   0     p      _7t            281       2215            0    7.8mb true       true
duplicate-my-index-2024-09-26   0     p      _7u            282       1487            0    4.4mb true       true
duplicate-my-index-2024-09-26   0     p      _7v            283        849            0    2.9mb true       true
duplicate-my-index-2024-09-26   0     p      _7w            284       7688            0   25.7mb true       true
duplicate-my-index-2024-09-26   0     p      _7x            285       9346            0     31mb true       true
duplicate-my-index-2024-09-26   0     p      _7y            286       5787            0   19.6mb true       true
duplicate-my-index-2024-09-26   0     p      _7z            287       5531            0   18.1mb true       true
duplicate-my-index-2024-09-26   0     p      _80            288    3655740            0    3.9gb true       false
duplicate-my-index-2024-09-26   0     p      _81            289       1231            0    2.8mb true       true
duplicate-my-index-2024-09-26   0     p      _82            290        743            0    1.7mb true       true
duplicate-my-index-2024-09-26   0     p      _83            291        374            0  907.6kb true       true
duplicate-my-index-2024-09-26   0     p      _84            292        343            0  845.6kb true       true
duplicate-my-index-2024-09-26   0     p      _89            297     345466            0  541.1mb true       true

Could this be caused by some segment fault during indexing or the merging process? If so, it's important for my use case that I can answer the following questions:

  • What are the general conditions under which these faults occur?
  • How often could this occur?
  • Is there a way to know when such faults occur?

I'm unfortunately not able to snapshot my index as it contains sensitive information but am happy to scrape any index metadata that could assist with possible theories.

Any clarification or information will be greatly appreciated.

Thank you again for your assistance.

Hi @Jamie123 :

I don't think compound segments should have an impact on this problem - we should have detected it in our tests in that case.

In case reindexing solves the problem, this hints at the HNSW graph itself. Did you modify your index_options defaults in your dense_vector field?

We are not aware of any bug in knn that could relate to this, and we have been unable to reproduce it on our side. Checking the code paths, nothing comes to mind as a possible cause.

That said, 8.7.1 was released on May 2023 - there have been quite a few improvements and iterations on knn. Updating to a newer version would be a recommended path, as you would benefit from multiple improvements for knn search.

I am facing the same problem when using filters with KNN in the query, exactly the same as mentioned.

Cluster information:

  • Version: 8.15
  • Nodes: 2

Index information

  • Shards: 6
  • Doc count: 4KK (ALL for semantic search)
  • Index size: 200 GB
  • Model : .multilingual-e5-small_linux-x86_64

the behavior is identical in the most recent versions

I'll try to explain, according to my understanding, what can happen:

The smaller dots represent the documents.

The green circle represents the candidate limit for this search.

Everything within the green circle has been semantically understood by the model for a specific term "without filter."

The blue diagonal arrow represents the filter, and the blue circles are the documents that represent this filter.

In other words, without the filter, there are 3 documents; with the filter, there are 7 documents.

Hi @Carlos_D

I've opened this case where I've been able to reproduce similar behaviour with knn search. Not sure if it's the same behaviour causing this issue but possibly.

Thank you @Keanu for including reproducible steps! We'll take a look into it :+1: