[ES 5.0.0-alpha3] top level inner hits replacement?

drallax · June 2, 2016, 1:12pm

in ES-5.0.0-alpha3, top-level inner hits have been removed
(https://github.com/elastic/elasticsearch/pull/17816)

Suppose I have an index with a parent type and a child type and i'd wish to search parents, returning parent documents along with their children.
I wonder what is now the best way to fetch the children of the top hits when the query has no has_child, e.g. a match-all query?

wrap the top-level query inside a boolean query as a filter|must clause
and add a should-clause containing a dummy has_child query to fetch the inner hits

or perhaps:
2) execute the query normally, without any innerhits, then grab the id's of the top hits and execute a
subsequent query to fetch the children

i am worried that 1) will not perform optimally due to the dummy boolean should-clause that may retrieve all children, not just the children of the top parent hits?
approach 2) on the other hand needs an extra round-trip but should otherwise be cheap.

any thoughts?

mvg · June 2, 2016, 2:06pm

This approach will not retrieve all children docs. As the inner hits are only returned in the top matching parent hits being returned. The overhead here is that a has_child query is used and this perform a join. However if the other query is something like a range, match or term query that filters down the number of parent matches than the cost of the join is acceptable.

If you're okay with an extra round trip then this is a good way to avoid using a has_child query.

drallax · June 2, 2016, 2:36pm

regarding 1):
In our case, the other query can be any query and can match (tens of) millions of parents.
but our app only needs the top 100 or so hits, including any children.
From your answer, I do not quite get whether this will perform as quickly as 2),
do I understand correctly that the innerhits implementation is capable of just fetching
the children for the 100 top hits and no other hits?

How 'about when the search request has an fromparameter, for example, requesting a slice of 100 hits,
starting at the 1000th hit. Will innerhits just fetch the children for the 100 returned parents or will it perform more work?

mvg · June 2, 2016, 3:55pm

Only the inner hits will be fetched of parent hits that are actually being returned. So if size is 10, from is 100 and there are 1M total hits found then the inner hits will only be included for 10 hits being returned.

I think second approach will be the best approach here, since you only need the top match children for each returned parent document. The extra round trip is likely to take less time time then performing the join with the has_child query, which actually is overkill, because you don't need this join to begin with. (since you don't query or aggregate on child fields)

drallax · June 2, 2016, 4:49pm

Thanks, exactly the info I was looking for

Topic		Replies	Views
Returning nested inner hits that don't match the main query in ES 5.x Elasticsearch	2	674	August 12, 2017
Elasticsearch inner-hits on child type not in the query Elasticsearch	2	599	November 11, 2017
Optimized inner hits Elasticsearch	3	509	April 25, 2017
Post_filtering or filtering the inner hits of parent-child search query result? Elasticsearch	1	420	May 30, 2019
Return parents with or without children Elasticsearch	1	1152	October 23, 2017

[ES 5.0.0-alpha3] top level inner hits replacement?

Related topics