Searching on nested docs - geting back the nested docs as a response


(liorg) #1

Hi,

we have somehow a complex type holding some nested docs with arrays (lets
assume an hierarchy of books and for each book we have an array of pages
containing its metadata).

we want to search for the nested doc - search for all the books that have
the term "XYZ" in one of their pages - but we want to get back not only the
book, but the pages themselves.

We've understood that it's problematic to achieve with ES
(see https://github.com/elasticsearch/elasticsearch/issues/3022).

We have a problem to achieve it with parent child model as the data model
comes from our mongodb already existing model (and besides, not sure if a
parent child model fits here).

so...

  1. Is there any a workaround we can do to get the results of the nested
    doc? (the actual pages?)
  2. If not, is there a recommended way we can search for the data again in
    memory after it was narrowed down by ES server?...
  3. Any advice will be appreciated as this is quite a big obstacle in our
    way to implement a solution using ES.

thanks,

Lior

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #2

This is usually something that's being solved using parent-child, but the
question here really is what do you mean by needing to retrieve both books
& pages.

Can you describe the actual scenario and what you are trying to achieve?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jun 19, 2014 at 7:12 PM, liorg liorgraf@gmail.com wrote:

Hi,

we have somehow a complex type holding some nested docs with arrays (lets
assume an hierarchy of books and for each book we have an array of pages
containing its metadata).

we want to search for the nested doc - search for all the books that have
the term "XYZ" in one of their pages - but we want to get back not only the
book, but the pages themselves.

We've understood that it's problematic to achieve with ES (see
https://github.com/elasticsearch/elasticsearch/issues/3022).

We have a problem to achieve it with parent child model as the data model
comes from our mongodb already existing model (and besides, not sure if a
parent child model fits here).

so...

  1. Is there any a workaround we can do to get the results of the nested
    doc? (the actual pages?)
  2. If not, is there a recommended way we can search for the data again in
    memory after it was narrowed down by ES server?...
  3. Any advice will be appreciated as this is quite a big obstacle in our
    way to implement a solution using ES.

thanks,

Lior

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zuks28fnRvh7%2BVkNU%3D205oytvppCs61PEAbwQJ-6%3Dn0kQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(liorg) #3

Well, assuming we have a book type. the book holds a lot of metadata, lets
say something of the following:
{
"author": {
"name": "Jose",
"lastName": "Martin"
},
"sections": [{
"chapters": [{
"pages": [{
"pageNum": 1,
"numOfChars": 1000,
"text": "let my people...",
"numofWords": 125
},
{
"pageNum": 2,
"numOfChars": 1005,
"text": "let my people go...",
"numofWords": 150
}],
"chapterName": "the start"
},
{
"pages": [{
"pageNum": 3,
"numOfChars": 1000,
"text": "will do...",
"numofWords": 125
},
{
"pageNum": 4,
"numOfChars": 1005,
"text": "will do later on...",
"numofWords": 150
}],
"chapterName": "the end"
}],
"sectionName": "prologue"
}]
}

we want to search for all the pages that have "let my people" in their text
and more than 100 words.
so, when we use ES we can use nested objects and query on the nested page
object - but the actual returned values are the books (parents) that have
those matching pages.
now, if we want to show the user the pages he was looking for - we cannot
do that, as we get the whole book type returned with all its metadata and
not just the nested objects that matched the criteria... - we need to
search again (maybe in memory?) for the pages that matched the criteria in
order to display the user his search results... (the whole type is returned
as ES does not support yet in returning the nested objects that matched the
criteria).

i hope it is better understood now

On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote:

This is usually something that's being solved using parent-child, but the
question here really is what do you mean by needing to retrieve both books
& pages.

Can you describe the actual scenario and what you are trying to achieve?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jun 19, 2014 at 7:12 PM, liorg <lior...@gmail.com <javascript:>>
wrote:

Hi,

we have somehow a complex type holding some nested docs with arrays (lets
assume an hierarchy of books and for each book we have an array of pages
containing its metadata).

we want to search for the nested doc - search for all the books that have
the term "XYZ" in one of their pages - but we want to get back not only the
book, but the pages themselves.

We've understood that it's problematic to achieve with ES (see
https://github.com/elasticsearch/elasticsearch/issues/3022).

We have a problem to achieve it with parent child model as the data model
comes from our mongodb already existing model (and besides, not sure if a
parent child model fits here).

so...

  1. Is there any a workaround we can do to get the results of the nested
    doc? (the actual pages?)
  2. If not, is there a recommended way we can search for the data again in
    memory after it was narrowed down by ES server?...
  3. Any advice will be appreciated as this is quite a big obstacle in our
    way to implement a solution using ES.

thanks,

Lior

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c3034e7-34d9-4b4d-802a-5110330b31a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #4

It is very hard to give you concrete advice without knowing more about your
domain and usecases, but here are 2 points that came to mind:

  1. You can make use of the highlighting features to show the content that
    matched. Highlighters can return whole blocks of text, and by using
    positionIncrements correctly you can get this right.

  2. Yes, Elasticsearch is a document-oriented storage, but is it really
    necessary for you to index entire books as one document? I'd most certainly
    look at indexing sections or chapters maybe even pages as single documents
    and use string references to the book ID. Unless you use data from the book
    level along with full-text searches on the texts, which even then in some
    scenarios I would consider denormalization.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jun 19, 2014 at 10:13 PM, liorg liorgraf@gmail.com wrote:

Well, assuming we have a book type. the book holds a lot of metadata, lets
say something of the following:
{
"author": {
"name": "Jose",
"lastName": "Martin"
},
"sections": [{
"chapters": [{
"pages": [{
"pageNum": 1,
"numOfChars": 1000,
"text": "let my people...",
"numofWords": 125
},
{
"pageNum": 2,
"numOfChars": 1005,
"text": "let my people go...",
"numofWords": 150
}],
"chapterName": "the start"
},
{
"pages": [{
"pageNum": 3,
"numOfChars": 1000,
"text": "will do...",
"numofWords": 125
},
{
"pageNum": 4,
"numOfChars": 1005,
"text": "will do later on...",
"numofWords": 150
}],
"chapterName": "the end"
}],
"sectionName": "prologue"
}]
}

we want to search for all the pages that have "let my people" in their
text and more than 100 words.
so, when we use ES we can use nested objects and query on the nested page
object - but the actual returned values are the books (parents) that have
those matching pages.
now, if we want to show the user the pages he was looking for - we cannot
do that, as we get the whole book type returned with all its metadata and
not just the nested objects that matched the criteria... - we need to
search again (maybe in memory?) for the pages that matched the criteria in
order to display the user his search results... (the whole type is returned
as ES does not support yet in returning the nested objects that matched the
criteria).

i hope it is better understood now

On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote:

This is usually something that's being solved using parent-child, but the
question here really is what do you mean by needing to retrieve both books
& pages.

Can you describe the actual scenario and what you are trying to achieve?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jun 19, 2014 at 7:12 PM, liorg lior...@gmail.com wrote:

Hi,

we have somehow a complex type holding some nested docs with arrays
(lets assume an hierarchy of books and for each book we have an array of
pages containing its metadata).

we want to search for the nested doc - search for all the books that
have the term "XYZ" in one of their pages - but we want to get back not
only the book, but the pages themselves.

We've understood that it's problematic to achieve with ES (see
https://github.com/elasticsearch/elasticsearch/issues/3022).

We have a problem to achieve it with parent child model as the data
model comes from our mongodb already existing model (and besides, not sure
if a parent child model fits here).

so...

  1. Is there any a workaround we can do to get the results of the nested
    doc? (the actual pages?)
  2. If not, is there a recommended way we can search for the data again
    in memory after it was narrowed down by ES server?...
  3. Any advice will be appreciated as this is quite a big obstacle in our
    way to implement a solution using ES.

thanks,

Lior

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c3034e7-34d9-4b4d-802a-5110330b31a4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6c3034e7-34d9-4b4d-802a-5110330b31a4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt%2BpBW2OLtML49G9_g0-U%3DsLEkqcA%3DBkc%3DfG%2BSzUCkFuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(liorg) #5

I am not sure highlight will work as i suspect it will encounter the same
obstacle, see in:

as for suggestion #2, this will break our current schema and will require a
significant model change (we store the data in MongoDB as well) - so, i am
not sure if we are not better off to wait until #3022 is solved? for the
meantime, any workaround will be appreciated...

can we do some in memory searching again? (using native lucene somehow?...)

On Friday, June 20, 2014 1:13:42 AM UTC+3, Itamar Syn-Hershko wrote:

It is very hard to give you concrete advice without knowing more about
your domain and usecases, but here are 2 points that came to mind:

  1. You can make use of the highlighting features to show the content that
    matched. Highlighters can return whole blocks of text, and by using
    positionIncrements correctly you can get this right.

  2. Yes, Elasticsearch is a document-oriented storage, but is it really
    necessary for you to index entire books as one document? I'd most certainly
    look at indexing sections or chapters maybe even pages as single documents
    and use string references to the book ID. Unless you use data from the book
    level along with full-text searches on the texts, which even then in some
    scenarios I would consider denormalization.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jun 19, 2014 at 10:13 PM, liorg <lior...@gmail.com <javascript:>>
wrote:

Well, assuming we have a book type. the book holds a lot of metadata,
lets say something of the following:
{
"author": {
"name": "Jose",
"lastName": "Martin"
},
"sections": [{
"chapters": [{
"pages": [{
"pageNum": 1,
"numOfChars": 1000,
"text": "let my people...",
"numofWords": 125
},
{
"pageNum": 2,
"numOfChars": 1005,
"text": "let my people go...",
"numofWords": 150
}],
"chapterName": "the start"
},
{
"pages": [{
"pageNum": 3,
"numOfChars": 1000,
"text": "will do...",
"numofWords": 125
},
{
"pageNum": 4,
"numOfChars": 1005,
"text": "will do later on...",
"numofWords": 150
}],
"chapterName": "the end"
}],
"sectionName": "prologue"
}]
}

we want to search for all the pages that have "let my people" in their
text and more than 100 words.
so, when we use ES we can use nested objects and query on the nested page
object - but the actual returned values are the books (parents) that have
those matching pages.
now, if we want to show the user the pages he was looking for - we cannot
do that, as we get the whole book type returned with all its metadata and
not just the nested objects that matched the criteria... - we need to
search again (maybe in memory?) for the pages that matched the criteria in
order to display the user his search results... (the whole type is returned
as ES does not support yet in returning the nested objects that matched the
criteria).

i hope it is better understood now

On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote:

This is usually something that's being solved using parent-child, but
the question here really is what do you mean by needing to retrieve both
books & pages.

Can you describe the actual scenario and what you are trying to achieve?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jun 19, 2014 at 7:12 PM, liorg lior...@gmail.com wrote:

Hi,

we have somehow a complex type holding some nested docs with arrays
(lets assume an hierarchy of books and for each book we have an array of
pages containing its metadata).

we want to search for the nested doc - search for all the books that
have the term "XYZ" in one of their pages - but we want to get back not
only the book, but the pages themselves.

We've understood that it's problematic to achieve with ES (see
https://github.com/elasticsearch/elasticsearch/issues/3022).

We have a problem to achieve it with parent child model as the data
model comes from our mongodb already existing model (and besides, not sure
if a parent child model fits here).

so...

  1. Is there any a workaround we can do to get the results of the nested
    doc? (the actual pages?)
  2. If not, is there a recommended way we can search for the data again
    in memory after it was narrowed down by ES server?...
  3. Any advice will be appreciated as this is quite a big obstacle in
    our way to implement a solution using ES.

thanks,

Lior

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c3034e7-34d9-4b4d-802a-5110330b31a4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6c3034e7-34d9-4b4d-802a-5110330b31a4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c31e949a-0d6c-400c-bffd-48e203e86c52%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6