Nested vs Parent-Child - index and search side differences


(Srinivasan Ramaswamy) #1

I searched the forum and internet in general, but i couldn't find clear
answers about the differences in scoring. And most of the answers are
pretty old. I would like to know all the important current
differences/comparison between nested and parent-child documents.

What i understood so far

  1. parent and child are two different lucene docs (and guaranteed to be in
    same shard). nested docs are stored as a separate doc using some internal
    representation and they are also in the same shard as parent doc.
  2. using nested document gives significantly better performance compared to
    parent-child documents.
  3. any update to a nested document will trigger the whole parent document
    to be reindexed, but any update to child will reindex only the child doc
  4. when you apply a filter on nested field, the filter will work but all
    the nested docs will be returned along with the parent (its a feature in
    progress https://github.com/elasticsearch/elasticsearch/issues/3022 ). we
    do not have this problem with parent-child.

Questions or need to confirm my understanding

  1. using nested documents will let me sort the documents based on fields in
    the nested documents, on the other hand i cannot sort by fields in child
    docs. (feature in progress
    https://github.com/elasticsearch/elasticsearch/issues/2917)
  2. filtering results based on a field is possible with both nested and
    parent-child documents

I am curious to know other differences from ranking/scoring perspective. I
would ideally like to score the parent documents by an aggregate function
(sum or avg) of a nested/child field. Any thoughts anyone ?

Thanks
Srini

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8183aa8d-1efc-40e5-8555-120bca8ff426%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #2

Your understanding is correct. To add more to it, nested documents are
stored in contiguous blocks in the index, making it very fast to resolve
the parent given a child and vice-versa. On the other hand for parent/child
there is sort or a hash table maintained on top of the index to match
parents with children. This makes indexing more flexible but search much
slower.

About your questions:

  1. Indeed you cannot sort by fields of a child doc.
  2. Correct.

My recommendation would be to only use parent/child when nested documents
are not applicable. They are much faster and memory-efficient at search
time. But sometimes, the need to reindex all nested documents might prove
not practical in which case parent/child might be an alternative.

On Tue, Jun 17, 2014 at 8:49 PM, Srinivasan Ramaswamy ursvasan@gmail.com
wrote:

I searched the forum and internet in general, but i couldn't find clear
answers about the differences in scoring. And most of the answers are
pretty old. I would like to know all the important current
differences/comparison between nested and parent-child documents.

What i understood so far

  1. parent and child are two different lucene docs (and guaranteed to be in
    same shard). nested docs are stored as a separate doc using some internal
    representation and they are also in the same shard as parent doc.
  2. using nested document gives significantly better performance compared
    to parent-child documents.
  3. any update to a nested document will trigger the whole parent document
    to be reindexed, but any update to child will reindex only the child doc
  4. when you apply a filter on nested field, the filter will work but all
    the nested docs will be returned along with the parent (its a feature in
    progress https://github.com/elasticsearch/elasticsearch/issues/3022 ).
    we do not have this problem with parent-child.

Questions or need to confirm my understanding

  1. using nested documents will let me sort the documents based on fields
    in the nested documents, on the other hand i cannot sort by fields in child
    docs. (feature in progress
    https://github.com/elasticsearch/elasticsearch/issues/2917)
  2. filtering results based on a field is possible with both nested and
    parent-child documents

I am curious to know other differences from ranking/scoring perspective. I
would ideally like to score the parent documents by an aggregate function
(sum or avg) of a nested/child field. Any thoughts anyone ?

Thanks
Srini

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8183aa8d-1efc-40e5-8555-120bca8ff426%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8183aa8d-1efc-40e5-8555-120bca8ff426%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j59Nz6UT29%2BnY_zzUT34ApQOH4%3DLcnzA5nKSqQ_SSnGgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3