Hello Elasticsearch Community,
I'm facing a critical challenge trying to balance search accuracy and performance when combining kNN queries with collapse
and inner_hits
in Elasticsearch 8.17. I'm seeing a puzzling filter behavior and significant performance differences between two common kNN query patterns.
My goal is to retrieve offers (documents) based on vector similarity, but strictly exclude any offers with certain flag_ids
(21 or 22). I also need to collapse
results by doc.sku.keyword
and pull in the "best offer" within each collapsed group using inner_hits
.
Here's the problem:
Scenario 1: Using Top-Level knn
(Fast, but Filter Inaccurate with collapse
)
When I place the knn
configuration at the top level of my search request, it's very fast. However, I observe an unexpected behavior with collapse
and inner_hits
:
Example:
offer1
(on Shard1) belongs toSKU_X
and matches myknn
'sfilter
(noflag_id
21 or 22).offer2
(on Shard2) also belongs toSKU_X
but does NOT match myknn
'sfilter
(it hasflag_id
21 or 22).
Observation: Despite offer2
failing the knn
's filter
, if offer1
is a strong kNN match, the final collapsed results for SKU_X
still include offer2
within the inner_hits
. If I run the same knn
query without collapse
, offer2
is correctly excluded.
My Question:
- Why does the
filter
specified within a top-levelknn
query not reliably prevent documents likeoffer2
from being returned viainner_hits
within a collapsed group, even whenoffer2
itself does not satisfy that filter? Is theknn
filter's scope not extended toinner_hits
?
Scenario 2: Using knn
within the query
DSL (Accurate, but Very Slow)
When I embed the knn
query within the query
DSL (e.g., inside a bool
-> must
clause), the filtering behaves correctly: offer2
is accurately excluded as expected, even with collapse
and inner_hits
.
Observation: While accurate, this query pattern leads to a drastic increase in latency and consistently pushes my cluster's CPU usage to 100%.
My Question:
- What fundamentally causes
knn
queries embedded within thequery
DSL to be so much slower and resource-intensive compared to the top-levelknn
search, even with identicalk
andnum_candidates
values? Is there a difference in how the underlying kNN search is executed or how candidates are gathered in these two contexts?
My Overall Goal:
I need to achieve both:
- Correct filter application: Ensuring only offers matching the kNN filter are considered for hits and
inner_hits
. - High performance: Maintaining low latency and reasonable CPU usage.
Any insights into these behaviors and strategies for achieving my objective would be greatly appreciated!
Thank you for your time and expertise.