Hybrid Search aggregations count mismatch on filters

I want to use aggregations on the hybrid search (query + knn), which will give me some facets that I can select in the UI and use as filters for subsequent queries. I'm using num_candidates=100 and k=20.

I read in the documentation that aggregations are calculated on the top k nearest documents. If it includes query, aggregations are calculated on the combined set of knn and query matches.

I'm seeing some count mismatches happening when the filters are applied.

Scenario 1:
Hybrid search with no filters applied - total count (50)
Input: Black sports shoes
aggregations:
Nike - 20
Adidas - 15
Puma - 15

Scenario 2:
Faceted Hybrid search with 1 filter - total count (30)
selected filter: Nike

From the above observation, when I selected the Nike brand and set it as a filter for the Hybrid search query, it gave more results than the initial count (20) from the aggregations result. Is this because the prefilter is happening on the "brand" field and searching only on those nearest documents which is pulling up more records?

I want to make the count result consistent even after applying the filters so that users won't be confused about the total number of results found.

Is there something wrong I'm doing or something I'm missing? Please suggest to me how to handle this scenario

Hi Ramgopal,

The challenge here is that with using KNN, everything is similar to a degree. When applying a filter to KNN, you will get back k results, abeit slightly more distant from the previous results.

It makes it more difficult in hybrid when results from different queries (bm25, KNN) are combined and result counts aren't representative of the next set of results.

One way you could improve this experience this is applying a distance to qualify matches to a min distance from the query.

Joe

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.