Top-Level knn with collapse: Accuracy-Performance Trade-off & Filter Behavior

Hello Elasticsearch Community,

I'm facing a critical challenge trying to balance search accuracy and performance when combining kNN queries with collapse and inner_hits in Elasticsearch 8.17. I'm seeing a puzzling filter behavior and significant performance differences between two common kNN query patterns.

My goal is to retrieve offers (documents) based on vector similarity, but strictly exclude any offers with certain flag_ids (21 or 22). I also need to collapse results by doc.sku.keyword and pull in the "best offer" within each collapsed group using inner_hits.

Here's the problem:

Scenario 1: Using Top-Level knn (Fast, but Filter Inaccurate with collapse)

When I place the knn configuration at the top level of my search request, it's very fast. However, I observe an unexpected behavior with collapse and inner_hits:

Example:

  • offer1 (on Shard1) belongs to SKU_X and matches my knn's filter (no flag_id 21 or 22).
  • offer2 (on Shard2) also belongs to SKU_X but does NOT match my knn's filter (it has flag_id 21 or 22).

Observation: Despite offer2 failing the knn's filter, if offer1 is a strong kNN match, the final collapsed results for SKU_X still include offer2 within the inner_hits. If I run the same knn query without collapse, offer2 is correctly excluded.

My Question:

  • Why does the filter specified within a top-level knn query not reliably prevent documents like offer2 from being returned via inner_hits within a collapsed group, even when offer2 itself does not satisfy that filter? Is the knn filter's scope not extended to inner_hits?

Scenario 2: Using knn within the query DSL (Accurate, but Very Slow)

When I embed the knn query within the query DSL (e.g., inside a bool -> must clause), the filtering behaves correctly: offer2 is accurately excluded as expected, even with collapse and inner_hits.

Observation: While accurate, this query pattern leads to a drastic increase in latency and consistently pushes my cluster's CPU usage to 100%.

My Question:

  • What fundamentally causes knn queries embedded within the query DSL to be so much slower and resource-intensive compared to the top-level knn search, even with identical k and num_candidates values? Is there a difference in how the underlying kNN search is executed or how candidates are gathered in these two contexts?

My Overall Goal:

I need to achieve both:

  1. Correct filter application: Ensuring only offers matching the kNN filter are considered for hits and inner_hits.
  2. High performance: Maintaining low latency and reasonable CPU usage.

Any insights into these behaviors and strategies for achieving my objective would be greatly appreciated!

Thank you for your time and expertise.

Hello,

You're encountering a subtle but important nuance in how Elasticsearch handles kNN queries, filtering, and result collapsing, particularly when using inner_hits. Here's a breakdown of what's happening in both scenarios:


:magnifying_glass_tilted_left: Scenario 1: Top-Level knn – Fast but Inaccurate Filtering

When you define the knn block at the top level of the request (outside the query DSL), Elasticsearch runs the approximate vector search first to retrieve the num_candidates nearest neighbors across shards. Then it applies the filter after candidate collection. Here's the catch:

  • Collapse and inner_hits happen after the candidate gathering.
  • This means documents that don’t match your filter (like flag_id: 21 or 22) may still appear in inner_hits if they share the same collapsed key (sku.keyword) as a valid match.

So yes, the filter in top-level knn is not strictly enforced within the inner_hits context. This is a known limitation and is essentially due to the sequence of operations in the kNN + collapse pipeline. It's a performance optimization, but it sacrifices strict filter enforcement.


:white_check_mark: Scenario 2: knn within query DSL – Accurate but Slow

Embedding the knn query inside the regular query DSL (like inside a bool -> must clause) changes the execution path:

  • Elasticsearch treats it like any other query clause, so filtering is enforced before collapsing and inner_hits.
  • The kNN logic becomes part of the broader scoring and filtering phase, which ensures accuracy, but at a cost:
    • Performance drops because approximate nearest neighbor search loses its optimizations.
    • Every candidate must pass through full query evaluation instead of only running the fast vector similarity search.

So yes, this approach is slower because it forfeits some of the internal optimizations that make ANN search fast when using the top-level knn syntax.


:bullseye: Your Goal: Accuracy + Performance

To achieve both strict filtering and fast response times, consider:

  1. Pre-filtering at Index Time:
  • If possible, exclude documents with flag_id: 21 or 22 during indexing or split them into a separate index.
  • This reduces the candidate pool and avoids needing to filter them out at query time.
  1. Script Workaround:
  • Use a script_score block with a conditional that immediately returns 0 for disallowed flag_id values, combined with a top_k limit.
  • This gives you finer control over scoring and filtering, though it adds some CPU cost.
  1. Post-Processing Option:
  • If strict filtering on inner_hits isn't supported natively, retrieve broader results and filter inner_hits manually in your application logic.

:pushpin: Final Thought

Balancing speed and strict filtering in Elasticsearch’s kNN functionality can be complex due to trade-offs in query execution paths. You're correct that embedding kNN in the query DSL is more accurate, while top-level kNN is faster but less strict with filters. Keep a close eye on filter placement, as it directly affects which documents appear in inner_hits.


If you're working on high-performance tools like video content analysis, filtering, or categorization using AI models, we understand how critical this is. On a related note, if you’re ever looking for a fast and easy way to download YouTube videos, check out y2mate.now – a reliable YouTube video downloader. It's optimized for speed and performance, just like your Elasticsearch goals!