Searching complex parent and child docs in the same query

I am storing information about companies in an ES index. Right now I have
a company type and a file type. Each file doc has a parent document
that is a company type. I would like to be able to search companies by
both the data within the company docs and within the child file docs.
I have been unable to figure out how to do this and am considering storing
the files as nested objects inside the company docs. My concern is
that this will create massive company docs that will cause some
unforeseen problems, as a company can have thousands of files associated
with it.

Company docs have a lot of information that I need to search by:
geolocation data, certifications, descriptions, titles, etc.

File docs also have a lot of information like title, description, keywords,
etc... as well as the content of the actual file ( pdf, word, ppt ).

Searching for top children won't take company doc data into account.

Filtering by a query on the children won't allow me to add a weighting
based on file results.

What should I do? Will including thousands of file docs ( with file
attachments ) as nested objects inside the parent doc cause problems?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Would it be possible to push the company info into each file doc? This
would be painful, though, if the company data changes.

I think your only other option is nested documents. This means that when
the file or company data changes that company/files doc will get entirely
re-indexed. Will this work at your scale? I have no clue :slight_smile:

So, you have to denormalize one way or another. Pushing the company data
into each file seems to be the cleaner approach.

Best Regards,
Paul

On Thursday, March 28, 2013 9:31:14 PM UTC-6, Brian Jones wrote:

I am storing information about companies in an ES index. Right now I have
a company type and a file type. Each file doc has a parent document
that is a company type. I would like to be able to search companies by
both the data within the company docs and within the child file docs.
I have been unable to figure out how to do this and am considering storing
the files as nested objects inside the company docs. My concern is
that this will create massive company docs that will cause some
unforeseen problems, as a company can have thousands of files associated
with it.

Company docs have a lot of information that I need to search by:
geolocation data, certifications, descriptions, titles, etc.

File docs also have a lot of information like title, description,
keywords, etc... as well as the content of the actual file ( pdf, word, ppt
).

Searching for top children won't take company doc data into account.

Filtering by a query on the children won't allow me to add a weighting
based on file results.

What should I do? Will including thousands of file docs ( with file
attachments ) as nested objects inside the parent doc cause problems?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You can issue the query on the file type and use the has_parent query or
filter to query on properties of the company.

Elasticsearch will hold the list of _id's of these parents in memory, it
will use this as a filter for the file docs.

Jaap Taal

[ Q42 BV | tel 070 44523 42 | direct 070 44523 65 | http://q42.nl |
Waldorpstraat 17F, Den Haag | Vijzelstraat 72 unit 4.23, Amsterdam | KvK
30164662 ]

On Fri, Mar 29, 2013 at 7:31 AM, ppearcy ppearcy@gmail.com wrote:

Would it be possible to push the company info into each file doc? This
would be painful, though, if the company data changes.

I think your only other option is nested documents. This means that when
the file or company data changes that company/files doc will get entirely
re-indexed. Will this work at your scale? I have no clue :slight_smile:

So, you have to denormalize one way or another. Pushing the company data
into each file seems to be the cleaner approach.

Best Regards,
Paul

On Thursday, March 28, 2013 9:31:14 PM UTC-6, Brian Jones wrote:

I am storing information about companies in an ES index. Right now I
have a company type and a file type. Each file doc has a parent
document that is a company type. I would like to be able to search
companies by both the data within the company docs and within the child
file docs. I have been unable to figure out how to do this and am
considering storing the files as nested objects inside the company
docs. My concern is that this will create massive company docs that will
cause some unforeseen problems, as a company can have thousands of files
associated with it.

Company docs have a lot of information that I need to search by:
geolocation data, certifications, descriptions, titles, etc.

File docs also have a lot of information like title, description,
keywords, etc... as well as the content of the actual file ( pdf, word, ppt
).

Searching for top children won't take company doc data into account.

Filtering by a query on the children won't allow me to add a weighting
based on file results.

What should I do? Will including thousands of file docs ( with file
attachments ) as nested objects inside the parent doc cause problems?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.