For a product search, I would like to return a list of products, and display the best matching variant from that product.
My idea is to index the list of variants as a nested property on the product, and the use the inner_hits to get the best matching variant in the result set.
This would allow me to still page through the results normally, as the product document is the main hit.
However, in the data there are products with a huge number of variants. The worst case is a whopping 60.000 variants for a single product. Other products just have a single variant.
Even getting a single document with 60.000 variants indexed would probably pose a challenge.
An alternative strategy is to index the variants by themselves, and then use a terms aggregation to get unique product hits. But this makes it harder to do paging by score.
Do you have any advice around how this kind of data could be indexed/queried in a way that allows the query to match on variant data, but return only one result per product?