Hello Elasticsearch Community,
I'm facing a critical challenge trying to balance search accuracy and performance when combining kNN queries with collapse and inner_hits in Elasticsearch 8.17. I'm seeing a puzzling filter behavior and significant performance differences between two common kNN query patterns.
My goal is to retrieve offers (documents) based on vector similarity, but strictly exclude any offers with certain flag_ids (21 or 22). I also need to collapse results by doc.sku.keyword and pull in the "best offer" within each collapsed group using inner_hits.
Here's the problem:
Scenario 1: Using Top-Level knn (Fast, but Filter Inaccurate with collapse)
When I place the knn configuration at the top level of my search request, it's very fast. However, I observe an unexpected behavior with collapse and inner_hits:
Example:
offer1(on Shard1) belongs toSKU_Xand matches myknn'sfilter(noflag_id21 or 22).offer2(on Shard2) also belongs toSKU_Xbut does NOT match myknn'sfilter(it hasflag_id21 or 22).
Observation: Despite offer2 failing the knn's filter, if offer1 is a strong kNN match, the final collapsed results for SKU_X still include offer2 within the inner_hits. If I run the same knn query without collapse, offer2 is correctly excluded.
My Question:
- Why does the
filterspecified within a top-levelknnquery not reliably prevent documents likeoffer2from being returned viainner_hitswithin a collapsed group, even whenoffer2itself does not satisfy that filter? Is theknnfilter's scope not extended toinner_hits?
Scenario 2: Using knn within the query DSL (Accurate, but Very Slow)
When I embed the knn query within the query DSL (e.g., inside a bool -> must clause), the filtering behaves correctly: offer2 is accurately excluded as expected, even with collapse and inner_hits.
Observation: While accurate, this query pattern leads to a drastic increase in latency and consistently pushes my cluster's CPU usage to 100%.
My Question:
- What fundamentally causes
knnqueries embedded within thequeryDSL to be so much slower and resource-intensive compared to the top-levelknnsearch, even with identicalkandnum_candidatesvalues? Is there a difference in how the underlying kNN search is executed or how candidates are gathered in these two contexts?
My Overall Goal:
I need to achieve both:
- Correct filter application: Ensuring only offers matching the kNN filter are considered for hits and
inner_hits. - High performance: Maintaining low latency and reasonable CPU usage.
Any insights into these behaviors and strategies for achieving my objective would be greatly appreciated!
Thank you for your time and expertise.