I searched the forum and internet in general, but i couldn't find clear
answers about the differences in scoring. And most of the answers are
pretty old. I would like to know all the important current
differences/comparison between nested and parent-child documents.
What i understood so far
parent and child are two different lucene docs (and guaranteed to be in
same shard). nested docs are stored as a separate doc using some internal
representation and they are also in the same shard as parent doc.
using nested document gives significantly better performance compared to
parent-child documents.
any update to a nested document will trigger the whole parent document
to be reindexed, but any update to child will reindex only the child doc
when you apply a filter on nested field, the filter will work but all
the nested docs will be returned along with the parent (its a feature in
progress https://github.com/elasticsearch/elasticsearch/issues/3022 ). we
do not have this problem with parent-child.
Questions or need to confirm my understanding
using nested documents will let me sort the documents based on fields in
the nested documents, on the other hand i cannot sort by fields in child
docs. (feature in progress https://github.com/elasticsearch/elasticsearch/issues/2917)
filtering results based on a field is possible with both nested and
parent-child documents
I am curious to know other differences from ranking/scoring perspective. I
would ideally like to score the parent documents by an aggregate function
(sum or avg) of a nested/child field. Any thoughts anyone ?
Your understanding is correct. To add more to it, nested documents are
stored in contiguous blocks in the index, making it very fast to resolve
the parent given a child and vice-versa. On the other hand for parent/child
there is sort or a hash table maintained on top of the index to match
parents with children. This makes indexing more flexible but search much
slower.
About your questions:
Indeed you cannot sort by fields of a child doc.
Correct.
My recommendation would be to only use parent/child when nested documents
are not applicable. They are much faster and memory-efficient at search
time. But sometimes, the need to reindex all nested documents might prove
not practical in which case parent/child might be an alternative.
On Tue, Jun 17, 2014 at 8:49 PM, Srinivasan Ramaswamy ursvasan@gmail.com
wrote:
I searched the forum and internet in general, but i couldn't find clear
answers about the differences in scoring. And most of the answers are
pretty old. I would like to know all the important current
differences/comparison between nested and parent-child documents.
What i understood so far
parent and child are two different lucene docs (and guaranteed to be in
same shard). nested docs are stored as a separate doc using some internal
representation and they are also in the same shard as parent doc.
using nested document gives significantly better performance compared
to parent-child documents.
any update to a nested document will trigger the whole parent document
to be reindexed, but any update to child will reindex only the child doc
when you apply a filter on nested field, the filter will work but all
the nested docs will be returned along with the parent (its a feature in
progress https://github.com/elasticsearch/elasticsearch/issues/3022 ).
we do not have this problem with parent-child.
Questions or need to confirm my understanding
using nested documents will let me sort the documents based on fields
in the nested documents, on the other hand i cannot sort by fields in child
docs. (feature in progress https://github.com/elasticsearch/elasticsearch/issues/2917)
filtering results based on a field is possible with both nested and
parent-child documents
I am curious to know other differences from ranking/scoring perspective. I
would ideally like to score the parent documents by an aggregate function
(sum or avg) of a nested/child field. Any thoughts anyone ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.