Unique values query without aggregations


(Itay Bittan) #1

Hey,

We have an index of unique products where each document represents a single product, with the following fields: product_id, group_id, group_score, and product_score.
Consider the following index:
{
"product_id": "100-001",
"group_id": "100",
"group_score": 100,
"product_score": 60,
},
{
"product_id": "100-002",
"group_id": "100",
"group_score": 100,
"product_score": 40,
},
{
"product_id": "100-001",
"group_id": "100",
"group_score": 100,
"product_score": 50,
},
{
"product_id": "200-001",
"group_id": "200",
"group_score": 73,
"product_score": 20,
},
{
"product_id": "200-002",
"group_id": "200",
"group_score": 73,
"product_score": 53,
}

Every group contains ~1-200 products.
We are trying to a query that matches the following conditions:

  1. Products should be sorted by their group_score (desc).
  2. No more than one product per group_id.
  3. Get the product with the highest product_score within the group.

For example, applying the query on the above should return:
{
"product_id": "100-001"
},
{
"product_id": "200-002"
}

We ended up with the following query:
{
"size": 0,
"aggs": {
"group_by_group_id": {
"terms": {
"field": "group_id",
"order":{
"max_group_score":"desc"
}
},
"aggs": {
"top_scores_hits": {
"top_hits": {
"sort": [
{
"product_score": {
"order": "desc"
}
}
],
"size": 1
}
},
"max_group_score":{
"max":{
"field":"group_score"
}
}
}
}
}
}

The problem is that the query is really slow because of the aggregations and the search performance is important.

We would love to hear your opinion about a better/efficient solution.
Changing the index structure is tolerable.


(Mark Harwood) #2

I imagine the slow bit might be the top-hits. Try removing that to check.

There will be redundancy in the top-hits retrieval because each shard will be returning more than the 10 groups (the shard_size setting controls this value) but on the final reduction many of these retrieved hits will be discarded because they are not competitive in the overall results.
If that's true maybe making 2 requests would be faster (one to find the top groups then another to fetch top products for the selected groups.)


(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.