is there a way to find documents using a (multi) match query on its nested sub documents? For example I'm having a post. Any post can have multiple tags and comments.
Posts must be searchable forever but mostly change in the first few days so I guess it makes sense to use nested objects instead of a parent-child relation.
I'd like to find this post using the query "elasticsearch question" with the "and" operator where "elasticsearch" is contained in an other subdocument than "indeed". Further it would be nice to do the same thing using some kind of multi-match query with cross_fields type, e.g. "elasticsearch comment".
I know I don't need to use nested objects for this kind of query. But it's impossible to find a document by tags of author 3 also having text "question" using a flat object.
You won't be able to answer those kinds of queries with just a plain nested (or parent/child) object. The problem is that each nested document is evaluated individually, as if it were an independent document. So when the query is looking for "elasticsearch question", it only sees "Elasticsearch" and "Question" independently, in two different nested docs.
So when you need to do a cross-nested-doc query, you can search that copy_to field. And when you need a query to check individual tags/comments in relation to the author, you can use the regular nested query.
If you want cross-field queries, you can do similar and combine both tags/comments into a single field. So you could potentially have "tags_flat", "comments_flat", "tags_comments_flat" fields.
Probably goes without saying, but this increases the size of your index, since you're duplicating data.
So you mean some kind of custom _all field? This is what I planned to do and what I already implemented for nested characteristics of hardware products of an e-commerce system using Elasticsearch 1.5.x.
I simply wasn't sure if there's a better solution in the latest release of Elasticsearch
Yep! You're basically building a custom _all field which includes the denormalized values of the children too. There used to be a include_in_root option which would do exactly this, but it was removed since the same can be done (more generically) with copy_to.
That's pretty much the only solution in this case, regardless of version... nothing new to help out. Relational support in ES is fairly limited, since it can only consider one document at a time, you need to do these kinds of tricks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.