I’m upgrading Elasticsearch from 8.17.3 to 8.19.10 and ran into a behavioural change with kNN + aggregations that breaks an existing use case.
What worked in 8.17.3:
We use knn inside the query DSL (bool.must) together with size: 0, because we don’t need hits, only the aggregations.
In 8.17.3, omitting k allowed aggregations to effectively run over all kNN candidates (num_candidates) across shards, as long as they passed a similarity / min_score threshold.
This let us:
- keep response size small (size: 0)
- run large nested aggregations
- avoid artificially bounding results to top-k
What breaks in 8.19.10:
In 8.19.10, the same query returns empty aggregation buckets.
After investigation, it appears that:
- k is now eagerly defaulted to size
- with size: 0, this effectively becomes k = 0
- aggregations then run over zero kNN hits
Setting an explicit k fixes emptiness, but introduces a hard top-k bound (e.g. k ≤ 10k), which changes semantics for us:
- our previous queries aggregated over all candidates above a similarity threshold
- now they are strictly bounded to top-k neighbours
Our use case is aggregation-heavy (nested + reverse_nested) and the kNN stage is only meant to define the candidate set, not to limit results to top-k.
In practice, the number of documents above the similarity threshold can vary from 1k to >1M, and we need aggregations to reflect that set.
Question
-
Is this behaviour change intentional (possibly related to “eager defaulting of k”)?
-
Is there a supported way in 8.19+ to:
-
keep size: 0
-
avoid hard top-k truncation
-
and still aggregate over the full kNN candidate set (num_candidates)?
-
Sample Snippet
{
"query": {
"function_score": {
"boost_mode": "replace",
"functions": [
{
"script_score": {
"script": {
"source": "_score / params.total_boost",
"params": {
"total_boost": 1
}
}
}
}
],
"min_score": 0.5,
"query": {
"bool": {
"filter": [],
"must": [
{
"knn": {
"field": "embeddings_768_bgebase",
"query_vector": [],
"num_candidates": 3500,
"boost": 1,
"similarity": 0
}
}
]
}
}
}
},
"aggs": {
"total_hits_bucket": {
"filter": {
"match_all": {}
},
"aggs": {
"score_filters": {
"range": {
"ranges": [
{
"from": 0.6
}
],
"script": {
"source": "_score"
}
}
}
}
}
},
"from": 0,
"size": 0
}