Is it not possible to query on fields of the owning documents inside a nested query? All I can find is 'reverse nested aggregation', I guess I need a 'reversed nested query' but that doesn't seem to exist. Can someone prove me wrong or confirm that this doesn't exist?
Thanks!
P.S. My current workaround will be copying the parent document fields I need to every nested document. Ugh :s
Nested queries effectively lie about which Lucene documents they have matched - they test properties of nested docs but ultimately report back the match as the root doc. They can be combined with root-level query clauses using the bool
query.
See slide 8 here for a visual representation of how this query logic works: http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene
(It is implemented slightly differently in Lucene with the order of owner and nested docs reversed but the principle is the same)
Thanks for the quick reply Mark. I was aware of the internals, as I understand that's why a nested context (the query) is needed to access those fields in the first place.
Unfortunately, a bool query combining nested queries is not the same as a single nested query containing a bool query. The former may have different nested documents matching different clauses, resulting in a match even if no single nested document fulfills the entire query.
I think I need example docs and queries to make this more concrete...
Sure. Given these documents:
{_id: 1, nested: [{name: ["John", "Smith"]}]}
{_id: 2, nested: [{name: ["John", "Doe"]}, {name: ["Jim", "Smith"]}]}
If I wanted to match documents with a single nested document containing John AND Smith, it's clear that:
{and: [{nested: {name: "John"}}, {nested: {name: "Smith"}}]}
Will match the second doc. That's what I meant by "different nested documents matching different clauses". The solution in this case is obvious, just rewrite the query, bringing the boolean combination inside the nested query:
{nested: {and: [{name: "John"}, {name: "Smith"}]}}
But that is not always possible. Once you start combining clauses that concern the root document with clauses that concern a nested document no amount of reordering can bring the nested document clauses neatly together without changing semantics. The simplest case I can come up with:
(<root clause 1> AND <nested clause 1>) OR (<root clause 2> AND <nested clause 2>)
Where I want nested clauses 1 & 2 to apply to the same nested document.
The ideal solution (I think) would be a 'reverse_nested' query. I could then wrap the entire expression in a single nested query, and the root clauses each in a reverse nested query.
I was OK understanding the problem up to this point. The top-level query is a bool OR so nested clauses 1 and 2 are not required to match together?
Eh.. Right
That's just incidental though, if you invert the OR's and AND's you get the problem I described:
(<root clause 1> OR <nested clause 1>) AND (<root clause 2> OR <nested clause 2>)
So these are the valid combos you accept?
rc1 + rc2
rc1 + nc2
rc2 + nc1
nc1 + nc2 (but only on the same nested doc)
That seems unusual but you could write this as
((rc1 OR rc2) AND (nested(nc1) OR nested(nc2) )) OR nested(nc1 OR nc2)
That seems unusual
Yeah it's an edgy case in a pretty complicated app, isolated and simplified, but it does happen in real life.
Anyway, I don't think your rewrite is entirely correct.
>>> (r1, n1, r2, n2) = (True, True, False, False)
>>> (r1 or n1) and (r2 or n2)
False
>>> ((r1 or r2) and (n1 or n2)) or (n1 or n2)
True
There probably is a correct way to rewrite it, but I have to deal with arbitrary nesting of and/or/not anyway. I will try and see if I can come up with an algorithm for rewriting the queries. Otherwise; I will have to duplicate the necessary fields of the root document in the nested document.
In either case, I think having reverse_query
might be of value. I will write up a clearer example and make a feature request.
Yeah, if my list of acceptable combos was correct I think the pseudo code for the related query is this:
((rc1 OR rc2) AND (nested(nc1) OR nested(nc2) )) OR nested(nc1 AND nc2)
I have created a feature request