Hey,
We have an index of unique products where each document represents a single product, with the following fields: product_id, group_id, group_score, and product_score.
Consider the following index:
{
"product_id": "100-001",
"group_id": "100",
"group_score": 100,
"product_score": 60,
},
{
"product_id": "100-002",
"group_id": "100",
"group_score": 100,
"product_score": 40,
},
{
"product_id": "100-001",
"group_id": "100",
"group_score": 100,
"product_score": 50,
},
{
"product_id": "200-001",
"group_id": "200",
"group_score": 73,
"product_score": 20,
},
{
"product_id": "200-002",
"group_id": "200",
"group_score": 73,
"product_score": 53,
}
Every group contains ~1-200 products.
We are trying to a query that matches the following conditions:
- Products should be sorted by their group_score (desc).
- No more than one product per group_id.
- Get the product with the highest product_score within the group.
For example, applying the query on the above should return:
{
"product_id": "100-001"
},
{
"product_id": "200-002"
}
We ended up with the following query:
{
"size": 0,
"aggs": {
"group_by_group_id": {
"terms": {
"field": "group_id",
"order":{
"max_group_score":"desc"
}
},
"aggs": {
"top_scores_hits": {
"top_hits": {
"sort": [
{
"product_score": {
"order": "desc"
}
}
],
"size": 1
}
},
"max_group_score":{
"max":{
"field":"group_score"
}
}
}
}
}
}
The problem is that the query is really slow because of the aggregations and the search performance is important.
We would love to hear your opinion about a better/efficient solution.
Changing the index structure is tolerable.