Parent/Child use case

Paul_Smith · September 19, 2011, 12:41am

Just wanted to check whether this scenario fit properly the parent/child
mapping feature.

We currently index just meta-data of documents (dozens of fields), however
we want to index file contents too as that's sometimes useful for our
customers (our use case, the meta data is the primary mechanism). Since we
have hundreds of millions of document records, and 100Tb+ filesize, it's a
non-trivial exercise we've managed to put off for a while.

Since any reindex requires indexing both meta-data and file content, which
for us is kept separately in DB & fileserver respectively, I didn't want any
meta-data update to also require a seek of the filestore to get the text
content for indexing especially since the text-content of a file never
changes (for us). I was hoping to find a way to keep meta-data and text
content separate in the index and meta- updates update independently.

I was thinking of having a parent/child relationship between the
meta(parent) and the full text (child), allowing the parent to update
(frequent) leaving the child pretty much alone once text extracted and
indexed. Text extraction of newly uploaded files can be done async, and a
new child record added in ES independent of the registration of the document
meta data record.

Does this sound the right use case for parent/child in ES?

As I understand it, if we needed to reindex (say, new fields, or changed
values or something) then we'd also have to reindex the children, but we
could do these 2 reindex operations separately, mark the full text bit
'offline' until that's completed, allowing the meta-data to be searched much
earlier.

Shay, it would help too in the Docs if the 'parent/chi/d' bit referred to
frequently is easy to find in the docs, I'm presuming it's the 'nested'
mapping type.. ? I see reference to parent/child in the forums etc, and
took me a while to bump into it in the docs when I went looking. I could be
blind though!

thanks,

Paul

ppearcy · September 20, 2011, 10:20pm

Hey Paul,
First off, the parent/child support and the nested support are
actually two different features. The nested support seems to be the
more feature rich variant, though.

I was interested in both these features for storing a frequently
changing popularity score without needing to re-index the document.
Unfortunately, neither of these features fit my use case. Parent/child
can be indexed separately, but it isn't possible to join the child
document for the sorting of the parent. For nested, all data needs to
be re-indexed due to how the data is stored internally in ES.

We avoid having to hit our backend datastore for meta updates by
pulling the data in ES, updating it and resubmitting. Although, there
is a new plugin that does exactly this. Either way, the whole document
needs to get re-indexed.

Don't take this as the definitive answer, though, I've only briefly
played around with both these features.

Best Regards,
Paul

On Sep 18, 6:41 pm, Paul Smith tallpsm...@gmail.com wrote:

Just wanted to check whether this scenario fit properly the parent/child
mapping feature.

We currently index just meta-data of documents (dozens of fields), however
we want to index file contents too as that's sometimes useful for our
customers (our use case, the meta data is the primary mechanism). Since we
have hundreds of millions of document records, and 100Tb+ filesize, it's a
non-trivial exercise we've managed to put off for a while.

Since any reindex requires indexing both meta-data and file content, which
for us is kept separately in DB & fileserver respectively, I didn't want any
meta-data update to also require a seek of the filestore to get the text
content for indexing especially since the text-content of a file never
changes (for us). I was hoping to find a way to keep meta-data and text
content separate in the index and meta- updates update independently.

I was thinking of having a parent/child relationship between the
meta(parent) and the full text (child), allowing the parent to update
(frequent) leaving the child pretty much alone once text extracted and
indexed. Text extraction of newly uploaded files can be done async, and a
new child record added in ES independent of the registration of the document
meta data record.

Does this sound the right use case for parent/child in ES?

As I understand it, if we needed to reindex (say, new fields, or changed
values or something) then we'd also have to reindex the children, but we
could do these 2 reindex operations separately, mark the full text bit
'offline' until that's completed, allowing the meta-data to be searched much
earlier.

Shay, it would help too in the Docs if the 'parent/chi/d' bit referred to
frequently is easy to find in the docs, I'm presuming it's the 'nested'
mapping type.. ? I see reference to parent/child in the forums etc, and
took me a while to bump into it in the docs when I went looking. I could be
blind though!

thanks,

Paul

Paul_Smith · September 20, 2011, 11:21pm

Thanks for the reply!

On 21 September 2011 08:20, ppearcy ppearcy@gmail.com wrote:

Hey Paul,
First off, the parent/child support and the nested support are
actually two different features. The nested support seems to be the
more feature rich variant, though.

scratches head So is there a web page on elasticsearch.com that details
the parent/child? I'm going blind, I could only find the 'nested' one
then.. ?

I was interested in both these features for storing a frequently
changing popularity score without needing to re-index the document.
Unfortunately, neither of these features fit my use case. Parent/child
can be indexed separately, but it isn't possible to join the child
document for the sorting of the parent. For nested, all data needs to
be re-indexed due to how the data is stored internally in ES.

By the '... it isn't possible to join the child document for the sorting of
the parent' part. I don't need to sort by any child value in this case, I
just need to be able to match on text in the child value sometimes and
return the parent as the hit.

So, if the parent ES document has a field called "documentnumber", and the
child is the text contents of the file attached to this parent and the child
has a field "contents", then if I search for:

documentnumber:ABC-123 OR content:foo

then the results should return any parent which has the field documentnumber
with that match, PLUS any parent's whose children have 'foo' in the content.
I then sort by one of the parent's fields.

Would this work? I'm hoping to optionally allow the customer to search the
contents of the file, but return it 'inline' with other matches of the
parent meta-record.

thanks,

Paul

ppearcy · September 21, 2011, 12:27am

Ah... I think that might work. Here is the best overall description
I can find on things:

github.com/elastic/elasticsearch

Parent / Child Support

opened 10:15PM - 07 Dec 10 UTC

closed 06:17AM - 08 Dec 10 UTC

kimchy

>feature v0.14.0

The parent/child documents support allows to define a parent relationship from a… child type to a parent type. ## Mapping The relationship is defined using a simple mapping definition at the child level mapping. For example, in case of a `blog` type and a `blog_tag` type child document, the mapping for `blog_tag` should be: ``` { "blog_tag" : { "_parent" : { "type" : "blog" } } } ``` The above defines a parent mapping, and the type of the parent. ## Indexing When indexing a child document, it is important that it will be routed to the same shard as the parent. This uses the routing capability. When indexing a doc with a parent id, it is automatically set as the routing value (unless the routing value is explicitly defined). Indexing a document with a parent id is simple: ``` curl -XPUT localhost:9200/blogs/blog_tag/1122?parent=1111 -d ' { "tag" : "something" } ' ``` There is an option to set `_parent` in each bulk index item as well. ## Querying There are several mechanisms to query child documents. The idea of child filter / query is that its inner query is run against the child documents, and the result of it are parent docs matching those child documents. The way it is implemented is that the child queries are first run on their own, with the results "joining" the parent documents. Then, the main query runs with the results of the child query, which includes the parent docs. # `has_child` The first is the `has_child` filter and `has_child` query (which is a simple `constant_score` query wrapping the `has_child` filter): ``` { "has_child" : { "type" : "blog_tag" "query" : { "term" : { "tag" : "something" } } } } ``` The `type` is the child type to query against. The parent type to return is automatically detected based on the mappings. The query (and filter), do no scoring, and the "join" process of matching which parent doc the child doc matches is done _on each matching child doc_. # `top_children` The `top_children` query basically runs the child query with an estimated hits size, and out of this hit docs, aggregates it into parent docs. If there aren't enough parent docs matching the requested from/size search request, then it is run again with a wider (more hits) search. The `top_children` also provide scoring capabilities, with the ability to specify `max`, `sum` or `avg` as the `score` type. One downside of using the `top_children` is that if there are more child docs matching the required hits when executing the child query, then the `total_hits` result of the search response will be incorrect. How many hits are asked for in the first child query run is controlled using the `factor` parameter (defaults to `5`). For example, when asking for 10 docs with from 0, then the child query will execute with 50 hits expected. If not enough parents are found (in our example, 10), and there are still more child docs to query, then the search hits are expanded my multiplying by the `incremental_factor` (defaults to `2`). The required parameters are the `query` and `type` (the child type to execute the query on). Here is an example with all different parameters, including the default values: ``` { "top_children" : { "type": "blog_tag", "query" : { "term" : { "tag" : "something" } } "score" : "max", "factor" : 5, "incremental_factor" : 2 } } ``` ## Faceting Faceting on the child query phase (on the results of the query executed) can be done by specifying a `scope` with a custom name in the query / filter. All facets now accept a `scope` to run on (similar to global set to `true`), and can now be executed on docs matching the child query. ## Query Performance In general, the `top_children` performance will be much better than the `has_child` performance. This is because joining the child to its parent is done in the `top_children` case against the expected number of hits returned, while in the `has_child` case, it is executed against _all_ child docs matching the child query. ## Memory Considerations With the current implementation, all `_id` values are loaded to memory (heap) in order to support fast lookups, so make sure there is enough mem for it.

And here are the relevant query types that can be run:

-> I think you'd want this one since it scores

Each one seems to query the children and return details on the parent.
Since you have a 1 to 1 mapping of parent to child, I think
top_children would work correctly.

I'd be curious to know if this ends up working for you, since I've
always had issues wrapping my head around the primary use cases for
this feature

Best Regards,
Paul

On Sep 20, 5:21 pm, Paul Smith tallpsm...@gmail.com wrote:

Thanks for the reply!

On 21 September 2011 08:20, ppearcy ppea...@gmail.com wrote:

Hey Paul,
First off, the parent/child support and the nested support are
actually two different features. The nested support seems to be the
more feature rich variant, though.

scratches head So is there a web page on elasticsearch.com that details
the parent/child? I'm going blind, I could only find the 'nested' one
then.. ?

I was interested in both these features for storing a frequently
changing popularity score without needing to re-index the document.
Unfortunately, neither of these features fit my use case. Parent/child
can be indexed separately, but it isn't possible to join the child
document for the sorting of the parent. For nested, all data needs to
be re-indexed due to how the data is stored internally in ES.

By the '... it isn't possible to join the child document for the sorting of
the parent' part. I don't need to sort by any child value in this case, I
just need to be able to match on text in the child value sometimes and
return the parent as the hit.

So, if the parent ES document has a field called "documentnumber", and the
child is the text contents of the file attached to this parent and the child
has a field "contents", then if I search for:

documentnumber:ABC-123 OR content:foo

then the results should return any parent which has the field documentnumber
with that match, PLUS any parent's whose children have 'foo' in the content.
I then sort by one of the parent's fields.

Would this work? I'm hoping to optionally allow the customer to search the
contents of the file, but return it 'inline' with other matches of the
parent meta-record.

thanks,

Paul

micuenta99 · November 15, 2011, 4:31pm

Hi Paul,
I also am interested in this scenario ( 'doc' (parent) and a
'filecontent' (child)) At the end how you did it do?
Thanks,
On 21 sep, 00:21, Paul Smith tallpsm...@gmail.com wrote:

Thanks for the reply!

On 21 September 2011 08:20, ppearcy ppea...@gmail.com wrote:

Hey Paul,
First off, the parent/child support and the nested support are
actually two different features. The nested support seems to be the
more feature rich variant, though.

scratches head So is there a web page on elasticsearch.com that details
the parent/child? I'm going blind, I could only find the 'nested' one
then.. ?

I was interested in both these features for storing a frequently
changing popularity score without needing to re-index the document.
Unfortunately, neither of these features fit my use case. Parent/child
can be indexed separately, but it isn't possible to join the child
document for the sorting of the parent. For nested, all data needs to
be re-indexed due to how the data is stored internally in ES.

By the '... it isn't possible to join the child document for the sorting of
the parent' part. I don't need to sort by any child value in this case, I
just need to be able to match on text in the child value sometimes and
return the parent as the hit.

So, if the parent ES document has a field called "documentnumber", and the
child is the text contents of the file attached to this parent and the child
has a field "contents", then if I search for:

documentnumber:ABC-123 OR content:foo

then the results should return any parent which has the field documentnumber
with that match, PLUS any parent's whose children have 'foo' in the content.
I then sort by one of the parent's fields.

Would this work? I'm hoping to optionally allow the customer to search the
contents of the file, but return it 'inline' with other matches of the
parent meta-record.

thanks,

Paul

Paul_Smith · November 16, 2011, 5:02am

On 16 November 2011 03:31, micu99 micuenta99@gmail.com wrote:

Hi Paul,
I also am interested in this scenario ( 'doc' (parent) and a
'filecontent' (child)) At the end how you did it do?

I haven't gotten around to really seriously giving this a try. I had a
quick go with the Top Children but got stuck with a syntax error (my own
fault I'm certain) and then was involved in some other things, so haven't
gotten back to this one as yet sorry!

micuenta99 · November 16, 2011, 6:45am

Thanks Paul,

Can anyone help?
What is the best way to implement a scenario like the one that comments
Paul?

2011/11/16 Paul Smith tallpsmith@gmail.com

On 16 November 2011 03:31, micu99 micuenta99@gmail.com wrote:

Hi Paul,
I also am interested in this scenario ( 'doc' (parent) and a
'filecontent' (child)) At the end how you did it do?

I haven't gotten around to really seriously giving this a try. I had a
quick go with the Top Children but got stuck with a syntax error (my own
fault I'm certain) and then was involved in some other things, so haven't
gotten back to this one as yet sorry!

Topic		Replies	Views
Parent/child relationships Elasticsearch	3	327	July 6, 2017
Help with Parent/Child (gist included!) Elasticsearch	3	308	July 6, 2017
Parent-child v/s multiple indexes - Elasticsearch 6.0+ Elasticsearch	1	2765	June 11, 2018
ES5 & parent/child & doc_value Elasticsearch	3	697	August 4, 2017
Updating Root Document & Nested Document Behaviour Elasticsearch	3	600	July 5, 2017

Parent/Child use case

Related topics