Aggregations based on filtered nested fields

I'm trying to construct a query with some aggregations based on filters of nested fields. Let me briefly explain my data models so the examples are easier to follow.

I've got a Winery model, which has many Wine models. The Wines that belong to a Winery are Wines that the Winery produced. I'm going to simplify my models for the sake of a simpler example. Lets assume there are no attributes of a Winery, other than an a unique ID. And lets assume that the only attributes of a Wine are a price, and a score (representing a critic's score).

When we search over Wineries, I want to return aggregations that tell me how many wineries match for each given price range and each given score range.

Here's an example of a search that we use, that is not returning the information I'm wanting.

GET wineries_development/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "from": 0,
  "timeout": "11s",
  "_source": false,
  "post_filter": {
    "nested": {
      "path": "wines",
      "query": {
        "bool": {
          "must": [
            {
              "range": {
                "wines.scores": {
                  "gte": 100
                }
              }
            },
            {
              "range": {
                "wines.price": {
                  "lt": 3100
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggregations": {
    "wines.price": {
      "filter": {
        "nested": {
          "path": "wines",
          "query": {
            "bool": {
              "must": [
                {
                  "range": {
                    "wines.scores": {
                      "gte": 100
                    }
                  }
                }
              ]
            }
          }
        }
      },
      "aggs": {
        "price": {
          "range": {
            "field": "wines.price",
            "keyed": true,
            "ranges": [
              {
                "to": 3100,
                "key": "<30"
              },
              {
                "from": 3100,
                "to": 5100,
                "key": "31-51"
              },
              {
                "from": 5100,
                "to": 10100,
                "key": "51-101"
              },
              {
                "from": 10100,
                "to": 25100,
                "key": "101-251"
              },
              {
                "from": 25100,
                "key": "251+"
              }
            ]
          }
        }
      }
    },
    "wines.scores": {
      "filter": {
        "nested": {
          "path": "wines",
          "query": {
            "bool": {
              "must": [
                {
                  "range": {
                    "wines.price": {
                      "lt": 3100
                    }
                  }
                }
              ]
            }
          }
        }
      },
      "aggs": {
        "scores": {
          "range": {
            "field": "wines.scores",
            "keyed": true,
            "ranges": [
              {
                "from": 100,
                "key": "100"
              },
              {
                "from": 97,
                "to": 100,
                "key": "97-99"
              },
              {
                "from": 94,
                "to": 97,
                "key": "94-96"
              },
              {
                "from": 91,
                "to": 94,
                "key": "91-93"
              },
              {
                "to": 91,
                "key": "<90"
              }
            ]
          }
        }
      }
    }
  }
}

Which is returning this:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "wines.scores": {
      "doc_count": 741,
      "scores": {
        "buckets": {
          "100": {
            "from": 100,
            "doc_count": 2
          },
          "<90": {
            "to": 91,
            "doc_count": 142
          },
          "91-93": {
            "from": 91,
            "to": 94,
            "doc_count": 179
          },
          "94-96": {
            "from": 94,
            "to": 97,
            "doc_count": 72
          },
          "97-99": {
            "from": 97,
            "to": 100,
            "doc_count": 18
          }
        }
      }
    },
    "wines.price": {
      "doc_count": 21,
      "price": {
        "buckets": {
          "<30": {
            "to": 3100,
            "doc_count": 2
          },
          "31-51": {
            "from": 3100,
            "to": 5100,
            "doc_count": 4
          },
          "51-101": {
            "from": 5100,
            "to": 10100,
            "doc_count": 10
          },
          "101-251": {
            "from": 10100,
            "to": 25100,
            "doc_count": 12
          },
          "251+": {
            "from": 25100,
            "doc_count": 19
          }
        }
      }
    }
  }
}

So this query says: "Get me all the wineries that have a wine that is both under $31 and has a score of 100". No such wines exist, and so no wineries should be returned. But if we look at the aggregation counts for the "100" score bucket, or the "<31" price bucket, we see that it gives a count of 2.

I understand why it's giving a count of 2. There are Wineries that exist that have a wine that is scored 100, and have a different wine that is priced under $31. Two separate wines.

The nested query that actually gets the results in the post_filter is doing the correct thing. Looking for a winery that has a single wine that matches the given criteria. But the aggregations have a filter aggregation that does the filter on the other field, and then has a range sub-aggregation which no longer factors in the filtered out wines in the filter aggregation.

I know that this is not a bug, I'm just not doing the right thing or it's not possible to do in the way that I want.

If anyone has any ideas on how to achieve this kind of aggregation I would love to hear about it!

Thanks for taking a look!

I believe you are after something like that:

GET wineries_development/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "from": 0,
  "timeout": "11s",
  "_source": false,
  "post_filter": {
    "nested": {
      "path": "wines",
      "query": {
        "bool": {
          "must": [
            {
              "range": {
                "wines.scores": {
                  "gte": 100
                }
              }
            },
            {
              "range": {
                "wines.price": {
                  "lt": 3100
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggregations": {
    "wines": {
      "nested": {
        "path": "wines"
      },
      "aggregations": {
        "wines.price": {
          "filter": {
            "range": {
              "wines.scores": {
                "gte": 100
              }
            }
          },
          "aggs": {
            "price": {
              "range": {
                "field": "wines.price",
                "keyed": true,
                "ranges": [
                  {
                    "to": 3100,
                    "key": "&lt;30"
                  },
                  {
                    "from": 3100,
                    "to": 5100,
                    "key": "31-51"
                  },
                  {
                    "from": 5100,
                    "to": 10100,
                    "key": "51-101"
                  },
                  {
                    "from": 10100,
                    "to": 25100,
                    "key": "101-251"
                  },
                  {
                    "from": 25100,
                    "key": "251+"
                  }
                ]
              }
            }
          }
        }
      },
      "wines.scores": {
        "filter": {
          "range": {
            "wines.price": {
              "lt": 3100
            }
          }
        },
        "aggs": {
          "scores": {
            "range": {
              "field": "wines.scores",
              "keyed": true,
              "ranges": [
                {
                  "from": 100,
                  "key": "100"
                },
                {
                  "from": 97,
                  "to": 100,
                  "key": "97-99"
                },
                {
                  "from": 94,
                  "to": 97,
                  "key": "94-96"
                },
                {
                  "from": 91,
                  "to": 94,
                  "key": "91-93"
                },
                {
                  "to": 91,
                  "key": "&lt;90"
                }
              ]
            }
          }
        }
      }
    }
  }
}

Putting the nested aggregation at the top makes your aggregation work at the nested level rather than the top level.

Oh, interesting! Ok, so does that then mean that the counts that I'm getting back are now counts of the nested documents? IE: The aggregation values are now counts of wines rather than wineries?

Yes. You can use the reverse_nested aggregation if you want wineries counts back.

Out of curiosity, say I had winery_id (the id of the parent document) as a field of the nested wine document. Do you know off the top of your head if it would be more performant to do a cardinality aggregation on that winery_id field, or the reverse_nested?

And thank you so so much for your help. I have been stuck on this problem for about a month.

reverse_nested will be more efficient.

Awesome, thank you again so much for your help!

I'm not sure of the scope of your project, but if there ever is a website that uses this functionality to expose wine search, please share the link. :slight_smile:

Definitely! We're working on a major rewrite of the front end UI and searching capability. We're shooting to have it wrapped up in another month or two. I will post a link back here when it's up and live!