I have some top level data (assignee, created_at etc) and some nested
items, think comments or commits etc.
Dumping an entire document into ES makes it easily searchable but some
weird side effects come up, most notably around sorting on the nested
comments.
What's the best practice for this kind of document search? Is it better to
split the comments into seperate documents with issue meta data attached,
or via each issue being a big dump in a document?
Ideally, I'd like to be able to search for an document, and sort by one of
the attributes of the document, either at the top level, or nested inside
one of the comments.
Sounds interesting. For these questions, I think it's helpful to consider
the interface you're building for the user. What's the fundamental "thing"
being shown in a list of search results?
Nested documents can be convenient, but generally I think the modeling for
this kind of scenario works best when you denormalize the data as much as
possible. In that approach, you'd index the children as individual
documents, and save the parent attributes onto them.
Field Collapsing can help if you're matching against multiple Comments but
are more interested in showing and sorting the parent Issues they belong
to.
On Mon, Oct 20, 2014 at 2:53 PM, Neil Middleton neil@heroku.com wrote:
I have some top level data (assignee, created_at etc) and some nested
items, think comments or commits etc.
Dumping an entire document into ES makes it easily searchable but some
weird side effects come up, most notably around sorting on the nested
comments.
What's the best practice for this kind of document search? Is it better to
split the comments into seperate documents with issue meta data attached,
or via each issue being a big dump in a document?
Ideally, I'd like to be able to search for an document, and sort by one of
the attributes of the document, either at the top level, or nested inside
one of the comments.
The parallel of a github issue is a good one. There are top level elements
(title, body) etc, and comments nested under that. Comment requests I see
are to find all issues with a certain string in a comment, ordered by
recency, but the item we want to show in the results is a link to the issue.
Sounds interesting. For these questions, I think it's helpful to consider
the interface you're building for the user. What's the fundamental "thing"
being shown in a list of search results?
Nested documents can be convenient, but generally I think the modeling for
this kind of scenario works best when you denormalize the data as much as
possible. In that approach, you'd index the children as individual
documents, and save the parent attributes onto them.
I have some top level data (assignee, created_at etc) and some nested
items, think comments or commits etc.
Dumping an entire document into ES makes it easily searchable but some
weird side effects come up, most notably around sorting on the nested
comments.
What's the best practice for this kind of document search? Is it better
to split the comments into seperate documents with issue meta data
attached, or via each issue being a big dump in a document?
Ideally, I'd like to be able to search for an document, and sort by one
of the attributes of the document, either at the top level, or nested
inside one of the comments.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.