Querying by children containing token of a query

Hello,

is there a way to find documents using a (multi) match query on its nested sub documents? For example I'm having a post. Any post can have multiple tags and comments.

{
  "author": 1,
  "tags": [
    {
      "id": 1,
      "author": 2,
      "text": "Elasticsearch"
    },
    {
      "id": 2,
      "author": 3,
      "text": "Question"
    }
  ],
  "comments": [
    {
      "id": 1,
      "author": 1,
      "text": "A helpful comment."
    },
    {
      "id": 2,
      "author": 4,
      "text": "Indeed."
    }
  ]
}

Posts must be searchable forever but mostly change in the first few days so I guess it makes sense to use nested objects instead of a parent-child relation.

I'd like to find this post using the query "elasticsearch question" with the "and" operator where "elasticsearch" is contained in an other subdocument than "indeed". Further it would be nice to do the same thing using some kind of multi-match query with cross_fields type, e.g. "elasticsearch comment".

I know I don't need to use nested objects for this kind of query. But it's impossible to find a document by tags of author 3 also having text "question" using a flat object.

Kind regards,
Marco

You won't be able to answer those kinds of queries with just a plain nested (or parent/child) object. The problem is that each nested document is evaluated individually, as if it were an independent document. So when the query is looking for "elasticsearch question", it only sees "Elasticsearch" and "Question" independently, in two different nested docs.

If you need this kind of cross-nested-document query, you can use copy_to to copy the text field up into the root document. This will give you a field on the root document which is essentially a bag of all terms from all the nested docs (that you choose to copy).

So when you need to do a cross-nested-doc query, you can search that copy_to field. And when you need a query to check individual tags/comments in relation to the author, you can use the regular nested query.

If you want cross-field queries, you can do similar and combine both tags/comments into a single field. So you could potentially have "tags_flat", "comments_flat", "tags_comments_flat" fields.

Probably goes without saying, but this increases the size of your index, since you're duplicating data.

1 Like

Hello Zachary,

many thanks for your quick reply!

So you mean some kind of custom _all field? This is what I planned to do and what I already implemented for nested characteristics of hardware products of an e-commerce system using Elasticsearch 1.5.x.

I simply wasn't sure if there's a better solution in the latest release of Elasticsearch :slight_smile:

Kind regards,
Marco

Yep! You're basically building a custom _all field which includes the denormalized values of the children too. There used to be a include_in_root option which would do exactly this, but it was removed since the same can be done (more generically) with copy_to.

That's pretty much the only solution in this case, regardless of version... nothing new to help out. Relational support in ES is fairly limited, since it can only consider one document at a time, you need to do these kinds of tricks. :slight_smile: