Index nested documents separately?

According to the docs about nested objects those will be internally indexed
as separate docs.
Is there any use case where it make sense to additionally index those
nested docs separately?

For example I have a document type "study" that can reference one or more
publication document types.

{
study {
"properties": {
"name":{"type":"string"},
.....
"publication" : {
"type":"nested",
"properties": {
"name":{"type":"string"},

               "title":{"type":"string"},
               "author":{"type":"string"},         
           }
     }
 }

}

Different studies can however reference the same publication.
With nested objects the same publication will be indexed multiple times in
ES ?

I can of course do a nested query if I want to search through the
publication document types but wouldn't it be more efficient if I
additionally index the publication docs separately and only do a normal
search or is the performance gain not worth it ?

Second does it sometimes make sense to store both direction of a
relationship? If I store the publication docs separately I could have the
study as nested object inside the publication document:

{
publication {
"properties": {
"title":{"type":"string"},
.....
"studies" : {
"type":"nested",
"properties": {
"name":{"type":"string"},

           }

     }
 }

}

Or can I can cover any use with just one direction?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Storing a document "separately" (not a nested doc) is effectively a
Parent/Child setup, which ES provides support for. A Child doc is a
completely separate document that is linked to the parent with some special
routing, allowing ES to use queries like has_child, has_parent, etc.

Performance wise, nested documents have faster Read operations than
parent/child (or a DIY equivalent), since the nested docs are stored in the
same Lucene segment. However, the penalty is that updating a single field
in the entire document will cause a re-index of everything, including the
nested docs. If you update fields frequently this is a non-negligible
overhead.

I wrote an article a while back explaining the high-level difference
between the various "relational" data types in ES. There is a summary
table at the bottom if you don't want to wade through all the text:
http://euphonious-intuition.com/2013/02/managing-relations-in-elasticsearch/

-Zach

On Tuesday, April 30, 2013 3:26:43 PM UTC+2, Ümit Seren wrote:

According to the docs about nested objects those will be internally
indexed as separate docs.
Is there any use case where it make sense to additionally index those
nested docs separately?

For example I have a document type "study" that can reference one or more
publication document types.

{
study {
"properties": {
"name":{"type":"string"},
.....
"publication" : {
"type":"nested",
"properties": {
"name":{"type":"string"},

               "title":{"type":"string"},
               "author":{"type":"string"},         
           }
     }
 }

}

Different studies can however reference the same publication.
With nested objects the same publication will be indexed multiple times in
ES ?

I can of course do a nested query if I want to search through the
publication document types but wouldn't it be more efficient if I
additionally index the publication docs separately and only do a normal
search or is the performance gain not worth it ?

Second does it sometimes make sense to store both direction of a
relationship? If I store the publication docs separately I could have the
study as nested object inside the publication document:

{
publication {
"properties": {
"title":{"type":"string"},
.....
"studies" : {
"type":"nested",
"properties": {
"name":{"type":"string"},

           }

     }
 }

}

Or can I can cover any use with just one direction?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

@Zachary Thanks for the feedback.
I actually read your blog before and it helps understand the relational
data types.
My problem is that I basically have a many to many relationship between two
data-types (publication and study).
I can't use parent/child because with parent child a child can only have
one parent. However in my case I can have multiple parents.
So I have to rely nested data types.

I basically have two issues/concern I can't get my head around.

1.) with nested data types I might have redundant copies of the same entity
in my index. If I am interested in only searching across the nested data
type (ignoring the parent entities) does it make sense to index the unique
instances of those nested types separately so that a search doesn't have to
go through all redundant copies?

2.) does it make sense to break a many to many relationship by having both
documents as nested data types of the other one?

On Tue, Apr 30, 2013 at 5:08 PM, Zachary Tong zacharyjtong@gmail.comwrote:

Storing a document "separately" (not a nested doc) is effectively a
Parent/Child setup, which ES provides support for. A Child doc is a
completely separate document that is linked to the parent with some special
routing, allowing ES to use queries like has_child, has_parent, etc.

Performance wise, nested documents have faster Read operations than
parent/child (or a DIY equivalent), since the nested docs are stored in the
same Lucene segment. However, the penalty is that updating a single field
in the entire document will cause a re-index of everything, including the
nested docs. If you update fields frequently this is a non-negligible
overhead.

I wrote an article a while back explaining the high-level difference
between the various "relational" data types in ES. There is a summary
table at the bottom if you don't want to wade through all the text:
http://euphonious-intuition.com/2013/02/managing-relations-in-elasticsearch/

-Zach

On Tuesday, April 30, 2013 3:26:43 PM UTC+2, Ümit Seren wrote:

According to the docs about nested objects those will be internally
indexed as separate docs.
Is there any use case where it make sense to additionally index those
nested docs separately?

For example I have a document type "study" that can reference one or more
publication document types.

{
study {
"properties": {
"name":{"type":"string"},
.....
"publication" : {
"type":"nested",
"properties": {
"name":{"type":"string"},

               "title":{"type":"string"},
               "author":{"type":"string"},
           }
     }
 }

}

Different studies can however reference the same publication.
With nested objects the same publication will be indexed multiple times
in ES ?

I can of course do a nested query if I want to search through the
publication document types but wouldn't it be more efficient if I
additionally index the publication docs separately and only do a normal
search or is the performance gain not worth it ?

Second does it sometimes make sense to store both direction of a
relationship? If I store the publication docs separately I could have the
study as nested object inside the publication document:

{
publication {
"properties": {
"title":{"type":"string"},
.....
"studies" : {
"type":"nested",
"properties": {
"name":{"type":"string"},

           }

     }
 }

}

Or can I can cover any use with just one direction?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/W95BA3D0Z1k/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sorry for the late reply on this. Elasticsearch is fundamentally flat, so
many-to-many arrangements are going to necessarily duplicate data. There
is really no way around it, unless you want to model the relations yourself.

For example, you could have two Types (publications and studies), where
each type has a field that references all the ID values in the relation
(e.g. a publication lists it's studies, and a study lists its
publications). This prevents data duplication, but forces you to perform
at least two queries to manually "join" this data together. The new Term
lookup feature in 0.90.0GA would help with this join logic and reduce the
number of queries needed.

The alternative, as you mentioned, is to denormalize your data into one of
the types so that data is duplicated. Either the parent or the child will
be duplicated - this is normal. The general advice is to think about what
you need to search on and structure your data accordingly. What data needs
to be searched, and what simply needs to be retrieved once you have search
results? Flatten your data in a way that searchable data is easily
searched/sorted.

On Tue, Apr 30, 2013 at 11:24 AM, Ümit Seren uemit.seren@gmail.com wrote:

@Zachary Thanks for the feedback.
I actually read your blog before and it helps understand the relational
data types.
My problem is that I basically have a many to many relationship between
two data-types (publication and study).
I can't use parent/child because with parent child a child can only have
one parent. However in my case I can have multiple parents.
So I have to rely nested data types.

I basically have two issues/concern I can't get my head around.

1.) with nested data types I might have redundant copies of the same
entity in my index. If I am interested in only searching across the nested
data type (ignoring the parent entities) does it make sense to index the
unique instances of those nested types separately so that a search doesn't
have to go through all redundant copies?

2.) does it make sense to break a many to many relationship by having both
documents as nested data types of the other one?

On Tue, Apr 30, 2013 at 5:08 PM, Zachary Tong zacharyjtong@gmail.comwrote:

Storing a document "separately" (not a nested doc) is effectively a
Parent/Child setup, which ES provides support for. A Child doc is a
completely separate document that is linked to the parent with some special
routing, allowing ES to use queries like has_child, has_parent, etc.

Performance wise, nested documents have faster Read operations than
parent/child (or a DIY equivalent), since the nested docs are stored in the
same Lucene segment. However, the penalty is that updating a single field
in the entire document will cause a re-index of everything, including the
nested docs. If you update fields frequently this is a non-negligible
overhead.

I wrote an article a while back explaining the high-level difference
between the various "relational" data types in ES. There is a summary
table at the bottom if you don't want to wade through all the text:
http://euphonious-intuition.com/2013/02/managing-relations-in-elasticsearch/

-Zach

On Tuesday, April 30, 2013 3:26:43 PM UTC+2, Ümit Seren wrote:

According to the docs about nested objects those will be internally
indexed as separate docs.
Is there any use case where it make sense to additionally index those
nested docs separately?

For example I have a document type "study" that can reference one or
more publication document types.

{
study {
"properties": {
"name":{"type":"string"},
.....
"publication" : {
"type":"nested",
"properties": {
"name":{"type":"string"},

               "title":{"type":"string"},
               "author":{"type":"string"},
           }
     }
 }

}

Different studies can however reference the same publication.
With nested objects the same publication will be indexed multiple times
in ES ?

I can of course do a nested query if I want to search through the
publication document types but wouldn't it be more efficient if I
additionally index the publication docs separately and only do a normal
search or is the performance gain not worth it ?

Second does it sometimes make sense to store both direction of a
relationship? If I store the publication docs separately I could have the
study as nested object inside the publication document:

{
publication {
"properties": {
"title":{"type":"string"},
.....
"studies" : {
"type":"nested",
"properties": {
"name":{"type":"string"},

           }

     }
 }

}

Or can I can cover any use with just one direction?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/W95BA3D0Z1k/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/W95BA3D0Z1k/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.