Choosing Parent/Child vs Nested Document


(sathis) #1

I searched through this forum and ES wiki but wasn't able to find
proper details.So i'm posting here. I will list some scenarios for
choosing Parent/Child over Nested Document model and vice versa. Feel
free to add new points or correct the existing ones.

Nested Document :

      1. If you need better performance.In nested documents always

parent, child will reside in the same shard i.e nearby physical
locations.
2. If you want to get both child documents and parent in a
single query (I'm not sure whether it is supported by Parent/Child).

Parent / Child :

      1. If you want to avoid data duplication. For many-to-many

relationships you have to embed same sub-documents if you use Nested
documents model. This can be avoided using parent/child relationship.

Thanks & Regards
Sathis


(Shay Banon) #2

Both parent/child or nested documents will have the relevant data reside in the same shard. The main difference is that nested are faster compared to parent/child, but, nested docs require reindexing the parent with all its children, while parent child allows to reindex / add / delete specific children.

On Saturday, February 18, 2012 at 10:31 AM, sathis wrote:

I searched through this forum and ES wiki but wasn't able to find
proper details.So i'm posting here. I will list some scenarios for
choosing Parent/Child over Nested Document model and vice versa. Feel
free to add new points or correct the existing ones.

Nested Document :

  1. If you need better performance.In nested documents always
    parent, child will reside in the same shard i.e nearby physical
    locations.
  2. If you want to get both child documents and parent in a
    single query (I'm not sure whether it is supported by Parent/Child).

Parent / Child :

  1. If you want to avoid data duplication. For many-to-many
    relationships you have to embed same sub-documents if you use Nested
    documents model. This can be avoided using parent/child relationship.

Thanks & Regards
Sathis


(sathis) #3

Thanks Shay for clarifying this.You are doing a wonderful job to open
source community. If you tag some issues as junior job, it will help
new developers like me to get involved.

On Feb 20, 5:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside in the same shard. The main difference is that nested are faster compared to parent/child, but, nested docs require reindexing the parent with all its children, while parent child allows to reindex / add / delete specific children.

On Saturday, February 18, 2012 at 10:31 AM, sathis wrote:

I searched through this forum and ES wiki but wasn't able to find
proper details.So i'm posting here. I will list some scenarios for
choosing Parent/Child over Nested Document model and vice versa. Feel
free to add new points or correct the existing ones.

Nested Document :

  1. If you need better performance.In nested documents always
    parent, child will reside in the same shard i.e nearby physical
    locations.
  2. If you want to get both child documents and parent in a
    single query (I'm not sure whether it is supported by Parent/Child).

Parent / Child :

  1. If you want to avoid data duplication. For many-to-many
    relationships you have to embed same sub-documents if you use Nested
    documents model. This can be avoided using parent/child relationship.

Thanks & Regards
Sathis


(Karel Minarik) #4

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside in the same shard. The main difference is that nested are faster compared to parent/child, but, nested docs require reindexing the parent with all its children, while parent child allows to reindex / add / delete specific children.


(haarts) #5

I'm not Shay (not by a long shot) but I do know the answer to that one!
Your assumption is correct. An update allows you to append stuff to the
document without sending it over the wire entirely. It also handles
conflicts (see retry_on_conflict argument). But the entire document is
reindexed which can be rather expensive. Adding tags probably wouldn't be a
problem. Added (hundreds of) comments to a blog post is a problem.

Harm

On Tuesday, 28 February 2012 11:37:19 UTC+1, Karel Minařík wrote:

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside
in the same shard. The main difference is that nested are faster compared
to parent/child, but, nested docs require reindexing the parent with all
its children, while parent child allows to reindex / add / delete specific
children.

On Tuesday, 28 February 2012 11:37:19 UTC+1, Karel Minařík wrote:

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside
in the same shard. The main difference is that nested are faster compared
to parent/child, but, nested docs require reindexing the parent with all
its children, while parent child allows to reindex / add / delete specific
children.

On Tuesday, 28 February 2012 11:37:19 UTC+1, Karel Minařík wrote:

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside
in the same shard. The main difference is that nested are faster compared
to parent/child, but, nested docs require reindexing the parent with all
its children, while parent child allows to reindex / add / delete specific
children.

On Tuesday, 28 February 2012 11:37:19 UTC+1, Karel Minařík wrote:

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside
in the same shard. The main difference is that nested are faster compared
to parent/child, but, nested docs require reindexing the parent with all
its children, while parent child allows to reindex / add / delete specific
children.


(Shay Banon) #6

++Harm

On Tuesday, February 28, 2012 at 1:18 PM, haarts wrote:

I'm not Shay (not by a long shot) but I do know the answer to that one!
Your assumption is correct. An update allows you to append stuff to the document without sending it over the wire entirely. It also handles conflicts (see retry_on_conflict argument). But the entire document is reindexed which can be rather expensive. Adding tags probably wouldn't be a problem. Added (hundreds of) comments to a blog post is a problem.

Harm

On Tuesday, 28 February 2012 11:37:19 UTC+1, Karel Minařík wrote:

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html (http://www.elasticsearch.org/guide/reference/api/update.html)] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside in the same shard. The main difference is that nested are faster compared to parent/child, but, nested docs require reindexing the parent with all its children, while parent child allows to reindex / add / delete specific children.

On Tuesday, 28 February 2012 11:37:19 UTC+1, Karel Minařík wrote:

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html (http://www.elasticsearch.org/guide/reference/api/update.html)] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside in the same shard. The main difference is that nested are faster compared to parent/child, but, nested docs require reindexing the parent with all its children, while parent child allows to reindex / add / delete specific children.

On Tuesday, 28 February 2012 11:37:19 UTC+1, Karel Minařík wrote:

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html (http://www.elasticsearch.org/guide/reference/api/update.html)] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside in the same shard. The main difference is that nested are faster compared to parent/child, but, nested docs require reindexing the parent with all its children, while parent child allows to reindex / add / delete specific children.

On Tuesday, 28 February 2012 11:37:19 UTC+1, Karel Minařík wrote:

Shay, do I understand correctly, that the update API [http://
www.elasticsearch.org/guide/reference/api/update.html (http://www.elasticsearch.org/guide/reference/api/update.html)] enables to
append "nested" documents to the "main" document, such as adding a tag
or comment to an article, without requiring to send the full document
over the wire? (And also without potential conflicts with concurrent
updates?)

Karel

On Feb 20, 1:58 pm, Shay Banon kim...@gmail.com wrote:

Both parent/child or nested documents will have the relevant data reside in the same shard. The main difference is that nested are faster compared to parent/child, but, nested docs require reindexing the parent with all its children, while parent child allows to reindex / add / delete specific children.


(btiernay) #7

I'm curious as to why nested is faster compared to parent/child since my
understanding is that nested is implemented similarly to parent/child. That
is, they both store nested/children documents independently. Any insight
would be greatly appreciated.

On Monday, 20 February 2012 07:58:04 UTC-5, kimchy wrote:

Both parent/child or nested documents will have the relevant data reside
in the same shard. The main difference is that nested are faster compared
to parent/child, but, nested docs require reindexing the parent with all
its children, while parent child allows to reindex / add / delete specific
children.

On Saturday, February 18, 2012 at 10:31 AM, sathis wrote:

I searched through this forum and ES wiki but wasn't able to find
proper details.So i'm posting here. I will list some scenarios for
choosing Parent/Child over Nested Document model and vice versa. Feel
free to add new points or correct the existing ones.

Nested Document :

  1. If you need better performance.In nested documents always
    parent, child will reside in the same shard i.e nearby physical
    locations.
  2. If you want to get both child documents and parent in a
    single query (I'm not sure whether it is supported by Parent/Child).

Parent / Child :

  1. If you want to avoid data duplication. For many-to-many
    relationships you have to embed same sub-documents if you use Nested
    documents model. This can be avoided using parent/child relationship.

Thanks & Regards
Sathis

--


(Clinton Gormley) #8

On Fri, 2013-01-18 at 04:28 -0800, btiernay wrote:

I'm curious as to why nested is faster compared to parent/child since
my understanding is that nested is implemented similarly to
parent/child. That is, they both store nested/children documents
independently. Any insight would be greatly appreciated.

Parents and children are top-level completely separate docs which are
stored on the same shard.

Nested docs are independent docs, but not visible on the top level. As I
understand it, the root doc and its nested docs are written in an
efficient format (block indexing) which improves performance.

clint

--


(Jayant Kerai) #9

I have a related question.

I want to index files (using something like Tika) but I also have a DB
holding custom meta data against each file (like the owner, tags, etc).
I would want to do a partial update to the meta data but don't want ES to
do a full re-index of the content.

If I have a parent document that holds all the DB meta data and have a
child document which holds the content from Tika, then I want to do a
partial update on the parent (i.e. add some new tags), am I right to assume
that the content document (the child) will not change or be re-indexed
under the hood when using parent/child setup?

I know that using nested documents would require an re-index of everything.

On Friday, 18 January 2013 12:42:57 UTC, Clinton Gormley wrote:

On Fri, 2013-01-18 at 04:28 -0800, btiernay wrote:

I'm curious as to why nested is faster compared to parent/child since
my understanding is that nested is implemented similarly to
parent/child. That is, they both store nested/children documents
independently. Any insight would be greatly appreciated.

Parents and children are top-level completely separate docs which are
stored on the same shard.

Nested docs are independent docs, but not visible on the top level. As I
understand it, the root doc and its nested docs are written in an
efficient format (block indexing) which improves performance.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #10

If I have a parent document that holds all the DB meta data and have a
child document which holds the content from Tika, then I want to do a
partial update on the parent (i.e. add some new tags), am I right to
assume that the content document (the child) will not change or be
re-indexed under the hood when using parent/child setup?

Correct. The parent and child docs are completely separate.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jayant Kerai) #11

Excellent. Thanks Clint.

On Thursday, 31 January 2013 10:34:41 UTC, Clinton Gormley wrote:

If I have a parent document that holds all the DB meta data and have a
child document which holds the content from Tika, then I want to do a
partial update on the parent (i.e. add some new tags), am I right to
assume that the content document (the child) will not change or be
re-indexed under the hood when using parent/child setup?

Correct. The parent and child docs are completely separate.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Amit Jayee) #12

Nested is faster than Parent -Child because internally nested documents are stored in the same Lucene block and hence the data is available locally but in case of Parent-Child the data is in the same Shard but not necessarily in the same Lucene block


(system) #13