How can I get child aggregations to respect a query applied to parents?


(Ray Renteria) #1

Greetings,

I've successfully created parent and child aggregations matching a query that includes child document match criteria. The aggregations are correct for the "parent" document counts, but the counts for the children aggregations do not respect the query.

Q: How can I get child aggregations to respect a query?

Here are the steps to reproduce my question immediately:

Some sample data to reproduce: one index, two "companies" and five "contacts":

PUT testindex
{ "mappings" : { "contact" : {  "_parent" : { "type" : "company" } } } }

PUT testindex/company/1
{ "name" : "company foo" }

PUT testindex/company/2
{ "name" : "company bar" }

PUT testindex/contact/11?parent=1&refresh
{ "name" : "alpha bravo charlie" }

PUT testindex/contact/12?parent=1&refresh
{ "name" : "delta" }

PUT testindex/contact/13?parent=2&refresh
{ "name" : "alpha echo" }

PUT testindex/contact/14?parent=2&refresh
{  "name" : "foxtrot golf" }

PUT testindex/contact/15?parent=2&refresh
{ "name" : "foxtrot hotel" }

Here's the query (notice the child query for the term "echo") 1 company meets this criteria and 1 contact meets this criteria:

POST testindex/company/_search
{
  "size" : 0,
  "query" : {
    "has_child" : {
      "type" : "contact",
      "query" : {
        "match" : {
          "name" : "echo"
        }
      }
    }
  },
  "aggs": {
    "unique_company_names": {
      "terms": {
        "field": "name.keyword"
      }
    },
    "contacts": {
      "children": {
        "type": "contact"
      },
      "aggs": {
        "unique_contact_names": {
          "terms": {
            "field": "name.keyword",
            "size": 500
          }
        }
      }
    }
  }
}

But here are the results. The results are as expected for the companies -- there is only one that has a child document ("contact") that has the term "echo" in the name, but the child aggregations show the counts for all of the matched company's child documents. I'd like there to only be one entry in the child unique_contact_names buckets:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "unique_company_names": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "company bar",
          "doc_count": 1
        }
      ]
    },
    "contacts": {
      "doc_count": 3,
      "unique_contact_names": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "alpha echo",
            "doc_count": 1
          },
          {
            "key": "foxtrot golf",
            "doc_count": 1
          },
          {
            "key": "foxtrot hotel",
            "doc_count": 1
          }
        ]
      }
    }
  }
}

Do I have to repeat the child query down into the child aggregations?

Your help would be greatly appreciated.

Thank you,

--Ray


(Ray Renteria) #2

Okay, I figured out a solution. I don't know if it's the best solution but it worked for me and I thought I'd share here. The "filters" aggregation type allows me to specify arbitrary criterion for each child aggregation. The only limitation is that I have to know the "buckets" ahead of time and name them. Term aggregations dynamically create the bucket names based on the values in the fields they're aggregating.

Still, the Filters aggregation will allow me to get everything in a single query. Here is the new JSON query for posterity. Note the new child Filters aggregation called "contacts_matching_parent_aggregation_criteria" . In my actual use case, there would be a list of filters representing departments containing the number of contacts in each department.

POST testindex/company/_search
{
  "size" : 0,
  "query" : {
    "has_child" : {
      "type" : "contact",
      "query" : {
        "match" : {
          "name" : "echo"
        }
      }
    }
  },
  "aggs": {
    "unique_company_names": {
      "terms": {
        "field": "name.keyword"
      }
    },
    "contacts": {
      "children": {
        "type": "contact"
      },
      "aggs": {
        "unique_contact_names": {
          "terms": {
            "field": "name.keyword",
            "size": 500
          }
        },
        "contacts_matching_parent_aggregation_criteria": {
          "filters" : {
            "filters" : {
              "contacts_having_echo_in_their_name" : {
                "match" : {
                  "name" : "echo"
                }
              }
            }
          }
        }
      }
    }
  }
}

For those that aren't too familiar with the elasticsearch query structure, .. an entire boolean search structure can be placed where I have the simple { "match" : { "name" : "echo" } } is, which is what I do in my actual use case.

--Ray


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.