Returning a combination of parent and child fields


(phill) #1

Can I build a search request that can return a combination of the parent
and child fields?

I was thinking of using a has_child query (filter?) or top_children
query which I believe I can work into getting me the right parents.
But I believe this only will work if the results could include some
fields from the parent and some from the child.

Is this possible? If I used the very new nested object field would it
be possible to navigate to the children?
I could if I had to, but I don't want to use nested objects because I
don't want to rewrite the parent because it can contain large amounts of
tokenized text.

-Paul

--


(Clinton Gormley) #2

On Thu, 2012-08-23 at 16:54 -0700, P. Hill wrote:

Can I build a search request that can return a combination of the parent
and child fields?

No.

Think of a query in elasticsearch as a filter that all documents are
passed through in a stream. A document either makes it through the
filter, or it doesn't. There is no joining.

Parent-child queries fake joins internally by running a query on the
child docs, then a query filtered by the parent doc ids. But during the
second query, the child docs are not available to extract content.

Is this possible? If I used the very new nested object field would it
be possible to navigate to the children?

With nested docs, you'd have the child fields available in the root
document (if you specified include_in_root/parent) but child docs in
nested object are (internally) still separate docs, so you won't know
which child docs actually matched. There is an open issue for this, as
it'd be a very nice feature to have.

You may have to do it in three stages:

  • query the child docs and keep the results around
  • query the parent docs
  • merge the results in your app

clint

--


(phill) #3

On 8/23/2012 11:32 PM, Clinton Gormley wrote:

On Thu, 2012-08-23 at 16:54 -0700, P. Hill wrote:

Can I build a search request that can return a combination of the parent
and child fields?
No.

Think of a query in elasticsearch as a filter that all documents are
passed through in a stream. A document either makes it through the
filter, or it doesn't. There is no joining.

Parent-child queries fake joins internally by running a query on the
child docs, then a query filtered by the parent doc ids. But during the
second query, the child docs are not available to extract content.

Is this possible? If I used the very new nested object field would it
be possible to navigate to the children?
With nested docs, you'd have the child fields available in the root
document (if you specified include_in_root/parent) but child docs in
nested object are (internally) still separate docs, so you won't know
which child docs actually matched. There is an open issue for this, as
it'd be a very nice feature to have.

You may have to do it in three stages:

  • query the child docs and keep the results around
  • query the parent docs
  • merge the results in your app

clint

OK,the results of a query is really a set of doc IDs of the parent type.

But all is not lost. Looking around I see I CAN find matching child
docs using a child query and the assign them to a scope, accessing this
scope in a facet.
Setting "query"."nested"."_scope" is mentioned in:
http://www.elasticsearch.org/guide/reference/query-dsl/top-children-query.html
"A |_scope| can be defined on the query allowing to run facets on the
same scope name that will work against the child documents."

Yeah, child docs!

This is not mentioned as a feature of nested queries on the page:
http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html

But is mentioned in a discussion of nested queries and _scope on the
facets page:
http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
So if I use

  1. nested documents (lets call mine "child" for this discussion)
  2. Each nested doc has a demormalized copy of something that gets a
    child back to its parent, let's say "parentName"
  3. nested queries with "_scope" on the nested query
  4. "scope" on the facet to work with the nested documents.

I believe this would allow me to find the youngest or oldest matching
child and its parent (actually I'm looking for youngest).
"query" : {
...
"nested" : {
"_scope" : "myMatchingChildren",
"query" : { ... }
...
},
"facets" : {
"justOldestandYoungestFacet" : {
"terms_stats" : {
"key_field" : "child.parentsName", <- This field is
created in the child to get me back to the parent.
"value_field":"child.creationDate" <-- binary datetime
stamp, but maybe a ES date field will work where, I'll have to check
},
"scope" : "myMatchingChildren"
}

Now, I'll a have set of stats for each parentName that was found in the
query that I can use to complete your step 3 "merge the results in the
app" having skipped an extra round trip to ask about children and then
parents.

I'll facets for each parent of "count, total, sum of squares, mean
(average), minimum, maximum, variance, and standard deviation"
count = how many children matched.
min = youngest child
max = oldest child
the others aren't particularly useful (average of a set of dates doesn't
have much meaning)

If I want all matching children and all it's fields, it's looks like I
can use

"allMatchingChildrenFacet": {
"terms" : {
"field" : "child._source"
},
scope: "myMatchingChildren"
}

If I want only a few fields I can kludge some values from the child into
the facet using a script_field (accessing _source or _fields)

"allMatchingChildrenFacet": {
"terms" : {
"script_field" : "_fields['parentName'] +"_fields['childName'] +_fields['creationDate']", <--- extend as needed to include any child field
},
scope: "myMatchingChildren"
}

Script fields are mentioned in term facets at:
http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html
But in this case I might have to use _field['child.parentName']

The term counts will be 1 for results of this facets assuming the
concatenation of the fields I want from the child is unique.

Now ANY APPROPRIATE FIELDS can be re-united with their parent result, by
combining the hits with the facet after parsing the "term"

In the above, it may be the case that instead of 'parentName' I can use
the actual '_parent' field of the child document.

At this time the answer to returning a combination of parent and child
fields looks like a query with nested or child query, using scope to
process the matching children in a facet.
I will see if I can come up with a gist that demonstrates this through
the REST API or shows where I can't get to something and the above
design fails.

-Paul

--


(phill) #4

Oh my! I realize there is a limitation of this hacky solution in which I get parents using a has_child then send children fields back by forcing them into a facet that ends up counting the unique children.

Facets count all values from a query, while the results can be limited to a "page" (a subset of the final result after sorting).

So if the search request explicitly asks for only subset of the values.
{
"from" : 20, "size" : 10,
"query" : { }
"facet" : { }
}

Then

  1. the hits will contain 10 objects while
  2. the facet will contain all the objects matched in the query.

(That is unless I can figure out how to add a filter_filter to the facet that will limit the faceting to the current "page". This doesn't sound at all possible since the start/total is not really part of the process of querying, just a way to finally filter the results)

So if you are "paging" with from/size looking for only a subset of the total this facet hack will NOT work, but return an excess of results.
In that case, you're left with having to build a second query (hopefully causing the re-use of some cached filtered results from the just prior query), which will find information about the children documents.

-Paul


(system) #5