If I have a parent and child relationship I can write a query to find
all parents that match criteria within the parent and other criteria
within the child.
I can use a has-child query to see if there are any appropriately
matching parents that have matching children The desired scoring is
based on relevancy. All is well as long as none of the relevancy
scoring is based any fields in the child which in my case is true. Such
a query that also includes a has_child would work great until the
application wants to see the count of children. The query hits doesn't
give the count of children matched.
Count of matching children
Approach 1: has_child, then facet
One way that comes to mind to find the count of matching children per
parent is to do a termS facet on the scope of the matching children
where the term is the "_parentId" and the stats are for any numeric
field in the child. Thus the count statistic from the facet is the
count of matching children for a particular parent.
The advantage to approach 1, is that the search hits for the query are
exactly the results, so "From" and "size" and total hits all are just
what is needed.
The disadvantage to this approach is that the facet list returned might
be huge (the facet would include all matching document IDs and some
stats about them). In my case that might be 10ks of facet values. That
doesn't sound good at all.
Approach 2: has_parent
Invert the whole question. Start looking for the children that match,
but also ask about has_parent that match. That sounds good (or at least
it did to me) until I realized that my ranking is based on a score of
fields in the parent, so I couldn't actually rank my hits as I wanted.
It also suffers from being a list of all children and what is wanted is
a list of parents, so requires post processing everything after the fact.
Approach 3: 2nd Query
After getting the result set of parents, go back to the index just to
retrieve all matching children, but only for the page of parents that we
are trying to return to the application, it certainly ought to be a
quick query being limited by a list of IDs. Then we can report the
count of how many children for each parent ID. This does have the
advantage that if the application needs something else from the
children, we could take the opportunity at that point to get it.
Does anyone have any comments or suggests on any of these approaches?