I'm trying to construct a query with some aggregations based on filters of nested fields. Let me briefly explain my data models so the examples are easier to follow.
I've got a Winery model, which has many Wine models. The Wines that belong to a Winery are Wines that the Winery produced. I'm going to simplify my models for the sake of a simpler example. Lets assume there are no attributes of a Winery, other than an a unique ID. And lets assume that the only attributes of a Wine are a price, and a score (representing a critic's score).
When we search over Wineries, I want to return aggregations that tell me how many wineries match for each given price range and each given score range.
Here's an example of a search that we use, that is not returning the information I'm wanting.
GET wineries_development/_search
{
"query": {
"match_all": {}
},
"size": 0,
"from": 0,
"timeout": "11s",
"_source": false,
"post_filter": {
"nested": {
"path": "wines",
"query": {
"bool": {
"must": [
{
"range": {
"wines.scores": {
"gte": 100
}
}
},
{
"range": {
"wines.price": {
"lt": 3100
}
}
}
]
}
}
}
},
"aggregations": {
"wines.price": {
"filter": {
"nested": {
"path": "wines",
"query": {
"bool": {
"must": [
{
"range": {
"wines.scores": {
"gte": 100
}
}
}
]
}
}
}
},
"aggs": {
"price": {
"range": {
"field": "wines.price",
"keyed": true,
"ranges": [
{
"to": 3100,
"key": "<30"
},
{
"from": 3100,
"to": 5100,
"key": "31-51"
},
{
"from": 5100,
"to": 10100,
"key": "51-101"
},
{
"from": 10100,
"to": 25100,
"key": "101-251"
},
{
"from": 25100,
"key": "251+"
}
]
}
}
}
},
"wines.scores": {
"filter": {
"nested": {
"path": "wines",
"query": {
"bool": {
"must": [
{
"range": {
"wines.price": {
"lt": 3100
}
}
}
]
}
}
}
},
"aggs": {
"scores": {
"range": {
"field": "wines.scores",
"keyed": true,
"ranges": [
{
"from": 100,
"key": "100"
},
{
"from": 97,
"to": 100,
"key": "97-99"
},
{
"from": 94,
"to": 97,
"key": "94-96"
},
{
"from": 91,
"to": 94,
"key": "91-93"
},
{
"to": 91,
"key": "<90"
}
]
}
}
}
}
}
}
Which is returning this:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"aggregations": {
"wines.scores": {
"doc_count": 741,
"scores": {
"buckets": {
"100": {
"from": 100,
"doc_count": 2
},
"<90": {
"to": 91,
"doc_count": 142
},
"91-93": {
"from": 91,
"to": 94,
"doc_count": 179
},
"94-96": {
"from": 94,
"to": 97,
"doc_count": 72
},
"97-99": {
"from": 97,
"to": 100,
"doc_count": 18
}
}
}
},
"wines.price": {
"doc_count": 21,
"price": {
"buckets": {
"<30": {
"to": 3100,
"doc_count": 2
},
"31-51": {
"from": 3100,
"to": 5100,
"doc_count": 4
},
"51-101": {
"from": 5100,
"to": 10100,
"doc_count": 10
},
"101-251": {
"from": 10100,
"to": 25100,
"doc_count": 12
},
"251+": {
"from": 25100,
"doc_count": 19
}
}
}
}
}
}
So this query says: "Get me all the wineries that have a wine that is both under $31 and has a score of 100". No such wines exist, and so no wineries should be returned. But if we look at the aggregation counts for the "100" score bucket, or the "<31" price bucket, we see that it gives a count of 2.
I understand why it's giving a count of 2. There are Wineries that exist that have a wine that is scored 100, and have a different wine that is priced under $31. Two separate wines.
The nested query that actually gets the results in the post_filter is doing the correct thing. Looking for a winery that has a single wine that matches the given criteria. But the aggregations have a filter aggregation that does the filter on the other field, and then has a range sub-aggregation which no longer factors in the filtered out wines in the filter aggregation.
I know that this is not a bug, I'm just not doing the right thing or it's not possible to do in the way that I want.
If anyone has any ideas on how to achieve this kind of aggregation I would love to hear about it!
Thanks for taking a look!