Best practise: Searching complex docs with ElasticSearch

I'm wanting to search a series of documents which have a nested object
nature. For instance a Github issue. I'm needing to ideally use the ES search
lite
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search-lite.html
syntax

I have some top level data (assignee, created_at etc) and some nested
items, think comments or commits etc.

Dumping an entire document into ES makes it easily searchable but some
weird side effects come up, most notably around sorting on the nested
comments.

What's the best practice for this kind of document search? Is it better to
split the comments into seperate documents with issue meta data attached,
or via each issue being a big dump in a document?

Ideally, I'd like to be able to search for an document, and sort by one of
the attributes of the document, either at the top level, or nested inside
one of the comments.

Any ideas?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42a3fffb-97ee-43e1-be68-02cd44bd9a28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hey Neil,

Sounds interesting. For these questions, I think it's helpful to consider
the interface you're building for the user. What's the fundamental "thing"
being shown in a list of search results?

Nested documents can be convenient, but generally I think the modeling for
this kind of scenario works best when you denormalize the data as much as
possible. In that approach, you'd index the children as individual
documents, and save the parent attributes onto them.

Field Collapsing can help if you're matching against multiple Comments but
are more interested in showing and sorting the parent Issues they belong
to.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/top-hits.html

On Mon, Oct 20, 2014 at 2:53 PM, Neil Middleton neil@heroku.com wrote:

I'm wanting to search a series of documents which have a nested object
nature. For instance a Github issue. I'm needing to ideally use the ES search
lite
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search-lite.html
syntax

I have some top level data (assignee, created_at etc) and some nested
items, think comments or commits etc.

Dumping an entire document into ES makes it easily searchable but some
weird side effects come up, most notably around sorting on the nested
comments.

What's the best practice for this kind of document search? Is it better to
split the comments into seperate documents with issue meta data attached,
or via each issue being a big dump in a document?

Ideally, I'd like to be able to search for an document, and sort by one of
the attributes of the document, either at the top level, or nested inside
one of the comments.

Any ideas?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/42a3fffb-97ee-43e1-be68-02cd44bd9a28%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/42a3fffb-97ee-43e1-be68-02cd44bd9a28%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Nick Zadrozny

Cofounder, CEO
One More Cloud

websolr.com • bonsai.io

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPTxa80VireENz2a9p2X246_0pq3Yn_Q7hZBb%3Ddh7NGuNe28LA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

The parallel of a github issue is a good one. There are top level elements
(title, body) etc, and comments nested under that. Comment requests I see
are to find all issues with a certain string in a comment, ordered by
recency, but the item we want to show in the results is a link to the issue.

N

On Mon, Oct 20, 2014 at 9:52 PM, Nick Zadrozny nick@onemorecloud.com
wrote:

Hey Neil,

Sounds interesting. For these questions, I think it's helpful to consider
the interface you're building for the user. What's the fundamental "thing"
being shown in a list of search results?

Nested documents can be convenient, but generally I think the modeling for
this kind of scenario works best when you denormalize the data as much as
possible. In that approach, you'd index the children as individual
documents, and save the parent attributes onto them.

Field Collapsing can help if you're matching against multiple Comments but
are more interested in showing and sorting the parent Issues they belong
to.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/top-hits.html

On Mon, Oct 20, 2014 at 2:53 PM, Neil Middleton neil@heroku.com wrote:

I'm wanting to search a series of documents which have a nested object
nature. For instance a Github issue. I'm needing to ideally use the ES search
lite
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/search-lite.html
syntax

I have some top level data (assignee, created_at etc) and some nested
items, think comments or commits etc.

Dumping an entire document into ES makes it easily searchable but some
weird side effects come up, most notably around sorting on the nested
comments.

What's the best practice for this kind of document search? Is it better
to split the comments into seperate documents with issue meta data
attached, or via each issue being a big dump in a document?

Ideally, I'd like to be able to search for an document, and sort by one
of the attributes of the document, either at the top level, or nested
inside one of the comments.

Any ideas?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/42a3fffb-97ee-43e1-be68-02cd44bd9a28%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/42a3fffb-97ee-43e1-be68-02cd44bd9a28%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Nick Zadrozny

Cofounder, CEO
One More Cloud

websolr.com • bonsai.io

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rumLatb020I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPTxa80VireENz2a9p2X246_0pq3Yn_Q7hZBb%3Ddh7NGuNe28LA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPTxa80VireENz2a9p2X246_0pq3Yn_Q7hZBb%3Ddh7NGuNe28LA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--

  • N

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjEqJjOWF_QbXbW_KVKxfgo5eS6pHbWFT6xXuxTOGQEs8rEDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.