I am trying to use aggregations on nested type. I want to apply a nested
filter to apply aggs to a subset of docs, but when I do that the "terms"
aggregation counts nested docs, not parent docs. In other words, if same
term is found in two different nested documents belong to the same parent,
count will be two. I want it to count only once per parent document. Using
"include_in_parent" does not help because then I cannot apply the nested
filter.
Here is my gist showing three different approaches:
nested_aggs.sh: Did not use "include_in_parent". Counts "Role1" twice
instead of once.
nested_include.sh: Added "include_in_parent". Counts "Role1" once,
but also returns "Role2" since I cannot filter the nested docs.
nested_include_key.sh: My workaround for now. Using
"include_in_parent" plus added my filter into new field called
"roleAdminKey". Then on my aggregation I used "include" parameter to apply
my filter.
I have posted my results in the gist as well. While #3 above works, my
actual mapping contains many more fields with multiple levels of nesting
and I'd like to be able to apply several other filtered aggregations on
nested types without having to add a "*Key" field for each one.
Is there a way to filter "terms" aggregations on nested types while
returning the count of parent docs, not nested docs?
It is not possible to count parent documents yet, but this will hopefully
be available in Elasticsearch 1.2.0 via the reverse_nested
aggregation[1], that would be able to translate back nested doc IDs to
parent doc IDs.
I am trying to use aggregations on nested type. I want to apply a nested
filter to apply aggs to a subset of docs, but when I do that the "terms"
aggregation counts nested docs, not parent docs. In other words, if same
term is found in two different nested documents belong to the same parent,
count will be two. I want it to count only once per parent document. Using
"include_in_parent" does not help because then I cannot apply the nested
filter.
Here is my gist showing three different approaches:
nested_aggs.sh: Did not use "include_in_parent". Counts "Role1"
twice instead of once.
nested_include.sh: Added "include_in_parent". Counts "Role1" once,
but also returns "Role2" since I cannot filter the nested docs.
nested_include_key.sh: My workaround for now. Using
"include_in_parent" plus added my filter into new field called
"roleAdminKey". Then on my aggregation I used "include" parameter to apply
my filter.
I have posted my results in the gist as well. While #3 above works, my
actual mapping contains many more fields with multiple levels of nesting
and I'd like to be able to apply several other filtered aggregations on
nested types without having to add a "*Key" field for each one.
Is there a way to filter "terms" aggregations on nested types while
returning the count of parent docs, not nested docs?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.