Facet on Nested Objects filtered by Outer object?

Is there a way to limit nested objects by something in the outer objects
and then do a facet on the nested objects?

What I tried was:
(1) Filter for a list of nested of objects that match on two fields in
the nested object.
(2) The nested objects are limited to those that match a query or filter
on the outer object by being part of query on the outer object.

I thought if I
(3) added a scope to nested query,
and
(4) Built a facet on the nested objects scope.

I'd get exactly what I needed.

This approach doesn't work, because the outer filter or query (2) does
NOT affect the documents found in nested scope for the facet (3)
Here is a cut down verion of my query to show what I am talking about.

{

"query" : {

"filtered" : {

"query" : {

"nested" : {

"filter" : { <--------- (1) filter for only the nested I wanted, it
works great.

"and" : {

"filters" : [ {

"terms" : {

"MyNested.field1" : [ "X" ],

}

}, {

"terms" : {

"MyNested.field2" : [ "a", “b”, “c” ],

"execution" : "and"

}

} ]

}

},

"path" : "MyNested",

"_scope" : "MyNestedScope" <---- (3) scoped for faceting

}

},

"filter" : { <------- (2) but limited to those that are nested within
matching outer objects, it also works great.

"terms" : {

"accessControl" : [“TheUserID" ]

}

}

}

},

"facets" : {

"SubPathFacet" : {

},

"scope" : "MyNestedScope" <--- (4) use of scope of nested objects to do
stats on the matching nested objects.

}

},

}

Note I have to use either nested or child because the child/nested
object has to match 2 fields for this to work.

Since I can't seem to limit the nested objects to those that are in
matching outer objects, it seems my scheme to build a facet based on
specially built nested objects is impossible.
I can't see how a facet on nested objects would ever be useful unless
the only facet one needed was the totals of all nested objects.

Am I missing something? Is there some way I can facet on nested objects
that are narrowed/limited by a query or filter on the outer objects?

Do I have to change to child objects instead and upgrade to 19.10 ? Then
I can use "has_parent" to solve this problem?

-Paul

--

Could you give an example for (2)? Since a nested object (or a child) can
belong to only one parent object, I don't quite understand what you are
trying to achieve.

On Monday, November 5, 2012 1:16:45 AM UTC-5, P Hill wrote:

Is there a way to limit nested objects by something in the outer objects
and then do a facet on the nested objects?

What I tried was:
(1) Filter for a list of nested of objects that match on two fields in
the nested object.
(2) The nested objects are limited to those that match a query or filter
on the outer object by being part of query on the outer object.

I thought if I
(3) added a scope to nested query,
and
(4) Built a facet on the nested objects scope.

I'd get exactly what I needed.

This approach doesn't work, because the outer filter or query (2) does
NOT affect the documents found in nested scope for the facet (3)
Here is a cut down verion of my query to show what I am talking about.

{

"query" : {

"filtered" : {

"query" : {

"nested" : {

"filter" : { <--------- (1) filter for only the nested I wanted, it
works great.

"and" : {

"filters" : [ {

"terms" : {

"MyNested.field1" : [ "X" ],

}

}, {

"terms" : {

"MyNested.field2" : [ "a", �b�, �c� ],

"execution" : "and"

}

} ]

}

},

"path" : "MyNested",

"_scope" : "MyNestedScope" <---- (3) scoped for faceting

}

},

"filter" : { <------- (2) but limited to those that are nested within
matching outer objects, it also works great.

"terms" : {

"accessControl" : [�TheUserID" ]

}

}

}

},

"facets" : {

"SubPathFacet" : {

�

},

"scope" : "MyNestedScope" <--- (4) use of scope of nested objects to do
stats on the matching nested objects.

}

},

}

Note I have to use either nested or child because the child/nested
object has to match 2 fields for this to work.

Since I can't seem to limit the nested objects to those that are in
matching outer objects, it seems my scheme to build a facet based on
specially built nested objects is impossible.
I can't see how a facet on nested objects would ever be useful unless
the only facet one needed was the totals of all nested objects.

Am I missing something? Is there some way I can facet on nested objects
that are narrowed/limited by a query or filter on the outer objects?

Do I have to change to child objects instead and upgrade to 19.10 ? Then
I can use "has_parent" to solve this problem?

-Paul

--

On 11/5/2012 9:10 AM, Igor Motov wrote:

Could you give an example for (2)? Since a nested object (or a child)
can belong to only one parent object, I don't quite understand what
you are trying to achieve.
Summary: Facet on nested objects that meet some criteria that are each
nested in one parent object that ALSO meets some criteria.
Not all nested objects, but a few of them. Not those few nested that
match, but in addition only those whose parents also match some criteria.

All Parents > The matching Parents (> means the set on the left is
larger than the set on the right).
All nested objects > The matching nested objects.

I hope the above suggests the idea that I want to do a facet on the
nested/child objects. My problem is that there seems no way to identify
a scope that is only those that match both the nested criteria and the
parent criteria and use that small set in the facet.

Talking about it like this, it sounds like something that could really
use a "has_parent" filter/query, but currently I'm only running 19.8 and
it doesn't have "has_parent".

If this were all about three fields in the parent, all I'd need was a
filter to pick the parents and run a facet on the parents, but it is
not. At the simplest it is one (or more fields) in the parent and
2 fields in the nested. Since the criteria is about 2 fields in the
nested, I need to avoid the cross (sub) object joins; therefore, I
thought to use a set of nested objects and not just multi-objects in the
parent. The important field I'm joining to in the parent is the
multi-valued Access Control List (my example just said user ID)

Even though I've offered to others on this list that they not worry
about a "little" de-normalization, I'm not quick to repeat all the
values of the entire ACL in every value of what is now the nested
objects. It seemed a bit of combinatorial explosion to me, but maybe
I'm over-worried.

Why? Because the nested objects are already each a repeated set
de-normalized values set up just for a needs of a particular query
needed in the application. It seemed excessive to turn one field from
the parent into two analyzed fields in a nested object and repeat the
nested objects with all appropriate variation resulting in 5-15 nested
objects. Then adding the non-analyzed ACL would be 5-15 repeats of the
ACL for each sub-object. That just seems like a lot of
de-normalization, since I've taken two fields and turned it into a ~100
de-normalized values. But what's another few 100 values in an index.

I could take my own advise and de-normalize some more data, and select
only within the nested object with a facet on that set of types. If I
did that I'd be working with only the usual case of a query/filter with
a facet.

-Paul

On Monday, November 5, 2012 1:16:45 AM UTC-5, P Hill wrote:

Is there a way to limit nested objects by something in the outer
objects
and then do a facet on the nested objects?

What I tried was:
(1) Filter for a list of nested of objects that match on two
fields in
the nested object.
(2) The nested objects are limited to those that match a query or
filter
on the outer object by being part of query on the outer object.

I thought if I
(3) added a scope to nested query,
and
(4) Built a facet on the nested objects scope.

I'd get exactly what I needed.

This approach doesn't work, because the outer filter or query (2)
does
NOT affect the documents found in nested scope for the facet (3)
Here is a cut down verion of my query to show what I am talking
about.

{

"query" : {

"filtered" : {

"query" : {

"nested" : {

"filter" : { <--------- (1) filter for only the nested I wanted, it
works great.

"and" : {

"filters" : [ {

"terms" : {

"MyNested.field1" : [ "X" ],

}

}, {

"terms" : {

"MyNested.field2" : [ "a", �b�, �c� ],

"execution" : "and"

}

} ]

}

},

"path" : "MyNested",

"_scope" : "MyNestedScope" <---- (3) scoped for faceting

}

},

"filter" : { <------- (2) but limited to those that are nested within
matching outer objects, it also works great.

"terms" : {

"accessControl" : [�TheUserID" ]

}

}

}

},

"facets" : {

"SubPathFacet" : {

�

},

"scope" : "MyNestedScope" <--- (4) use of scope of nested objects
to do
stats on the matching nested objects.

}

},

}


Note I have to use either nested or child because the child/nested
object has to match _2_ fields for this to work.

Since I can't seem to limit the nested objects to those that are in
matching outer objects, it seems my scheme to build a facet based on
specially built nested objects is impossible.
I can't see how a facet on nested objects would ever be useful unless
the only facet one needed was the totals of all nested objects.

Am I missing something? Is there some way I can facet on nested
objects
that are narrowed/limited by a query or filter on the outer objects?

Do I have to change to child objects instead and upgrade to 19.10
? Then
I can use "has_parent" to solve this problem?

-Paul

--

--

Could you take a look at this example https://gist.github.com/4020172 and
modify the output so it meets your requirements?

On Monday, November 5, 2012 1:51:17 PM UTC-5, P Hill wrote:

On 11/5/2012 9:10 AM, Igor Motov wrote:

Could you give an example for (2)? Since a nested object (or a child)
can belong to only one parent object, I don't quite understand what
you are trying to achieve.
Summary: Facet on nested objects that meet some criteria that are each
nested in one parent object that ALSO meets some criteria.
Not all nested objects, but a few of them. Not those few nested that
match, but in addition only those whose parents also match some criteria.

All Parents > The matching Parents (> means the set on the left is
larger than the set on the right).
All nested objects > The matching nested objects.

I hope the above suggests the idea that I want to do a facet on the
nested/child objects. My problem is that there seems no way to identify
a scope that is only those that match both the nested criteria and the
parent criteria and use that small set in the facet.

Talking about it like this, it sounds like something that could really
use a "has_parent" filter/query, but currently I'm only running 19.8 and
it doesn't have "has_parent".

If this were all about three fields in the parent, all I'd need was a
filter to pick the parents and run a facet on the parents, but it is
not. At the simplest it is one (or more fields) in the parent and
2 fields in the nested. Since the criteria is about 2 fields in the
nested, I need to avoid the cross (sub) object joins; therefore, I
thought to use a set of nested objects and not just multi-objects in the
parent. The important field I'm joining to in the parent is the
multi-valued Access Control List (my example just said user ID)

Even though I've offered to others on this list that they not worry
about a "little" de-normalization, I'm not quick to repeat all the
values of the entire ACL in every value of what is now the nested
objects. It seemed a bit of combinatorial explosion to me, but maybe
I'm over-worried.

Why? Because the nested objects are already each a repeated set
de-normalized values set up just for a needs of a particular query
needed in the application. It seemed excessive to turn one field from
the parent into two analyzed fields in a nested object and repeat the
nested objects with all appropriate variation resulting in 5-15 nested
objects. Then adding the non-analyzed ACL would be 5-15 repeats of the
ACL for each sub-object. That just seems like a lot of
de-normalization, since I've taken two fields and turned it into a ~100
de-normalized values. But what's another few 100 values in an index.

I could take my own advise and de-normalize some more data, and select
only within the nested object with a facet on that set of types. If I
did that I'd be working with only the usual case of a query/filter with
a facet.

-Paul

On Monday, November 5, 2012 1:16:45 AM UTC-5, P Hill wrote:

Is there a way to limit nested objects by something in the outer 
objects 
and then do a facet on the nested objects? 

What I tried was: 
(1) Filter for a list of nested of objects that match on two 
fields in 
the nested object. 
(2) The nested objects are limited to those that match a query or 
filter 
on the outer object by being part of query on the outer object. 

I thought if I 
(3) added a scope to nested query, 
and 
(4) Built a facet on the nested objects scope. 

I'd get exactly what I needed. 

This approach doesn't work, because the outer filter or query (2) 
does 
NOT affect the documents found in nested scope for the facet (3) 
Here is a cut down verion of my query to show what I am talking 
about. 

{ 

"query" : { 

"filtered" : { 

"query" : { 

"nested" : { 

"filter" : { <--------- (1) filter for only the nested I wanted, it 
works great. 

"and" : { 

"filters" : [ { 

"terms" : { 

"MyNested.field1" : [ "X" ], 

} 

}, { 

"terms" : { 

"MyNested.field2" : [ "a", �b�, �c� ], 

"execution" : "and" 

} 

} ] 

} 

}, 

"path" : "MyNested", 

"_scope" : "MyNestedScope" <---- (3) scoped for faceting 

} 

}, 

"filter" : { <------- (2) but limited to those that are nested 

within

matching outer objects, it also works great. 

"terms" : { 

"accessControl" : [�TheUserID" ] 

} 

} 

} 

}, 

"facets" : { 

"SubPathFacet" : { 

� 

}, 

"scope" : "MyNestedScope" <--- (4) use of scope of nested objects 
to do 
stats on the matching nested objects. 

} 

}, 

} 


Note I have to use either nested or child because the child/nested 
object has to match _2_ fields for this to work. 

Since I can't seem to limit the nested objects to those that are in 
matching outer objects, it seems my scheme to build a facet based on 
specially built nested objects is impossible. 
I can't see how a facet on nested objects would ever be useful 

unless

the only facet one needed was the totals of all nested objects. 

Am I missing something? Is there some way I can facet on nested 
objects 
that are narrowed/limited by a query or filter on the outer objects? 

Do I have to change to child objects instead and upgrade to 19.10 
? Then 
I can use "has_parent" to solve this problem? 

-Paul 

--

--

On 11/5/2012 12:38 PM, Igor Motov wrote:

Could you take a look at this example https://gist.github.com/4020172
and modify the output so it meets your requirements?
I will take a look at that sometime soon! Thanks, -Paul

--

Thank you for taking the time to start a gist on this.

I have been trying to work out why I can't build a query against my data
very much "like" yours and get the match parent AND match nested
behavior that your example does.
Obviously, it is not as much "like" yours as I think it is. :slight_smile:
I have yet to modify yours to include too much as you my query does, but
I will keep trying.

My 1st question is:
Where is it defined that a scope can be defined on a subquery?
I thought scopes where about saving a filter result? Am I getting scope
confused with caching?

I am also not sure I have a users model of how your query works

When I write a
query: { bool: { must: [ { } , { , scope: "my_scope" }
] } }
I thought it was doing something like:

(1) INNER_JOIN( queryA(), queryB() )
That is, it the results of running queryA and separately (without
consideration of queryA) the results of running queryB are generated.
Only then are the two result sets joined together.

Thus I couldn't figure out how declaring a scope for queryB would
include queryA and thus why I was scratching my head about the
usefulness of a nested query scope,
but your working example suggests that it is more like.

(2) queryB(queryA())
queryB is applied using the documents returned by the results of
queryA(), thus the results at the end of queryB is by both queryA and
queryB, as you demonstrate.

Is (2) what is really going on or is there another model of how it works?


Re-reading the facets page again, I spotted another place that I was
making an assumption related to scope.
http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
"There’s one important distinction to keep in mind. While search
/queries/ restrict both the returned documents and facet counts, search
/filters/ restrict only returned documents — but /not/ facet counts."
I was assuming that the term "search filters" is the final filter listed
directly within the "search" struct, not any filter mentioned within a
some filtered query (or other query that includes a filter).
For example,
search {
query: {
filtered: {
query: { ... },
filter: { ... } <-- I thought the quote does NOT refer to this.
},
filter: { ... }, <-- I though the quote refers to this
facet: { ... }
}

Am I wrong about that?

Alas, even re-working my query to be a serious of clauses in a bool
query with NO filters didn't do the trick. Regardless, answers to the
above probably would help me understand what direction I should take on
this any other queries.

-Paul
On 11/5/2012 5:22 PM, P. Hill wrote:

On 11/5/2012 12:38 PM, Igor Motov wrote:

Could you take a look at this example https://gist.github.com/4020172
and modify the output so it meets your requirements?
I will take a look at that sometime soon! Thanks, -Paul

--

Yeah I see your point now. The sample that I posted is not a proper
solution for this issue. I will try to come up with a better solution
shortly.

On Tuesday, November 6, 2012 2:17:54 PM UTC-5, P Hill wrote:

Thank you for taking the time to start a gist on this.

I have been trying to work out why I can't build a query against my data
very much "like" yours and get the match parent AND match nested
behavior that your example does.
Obviously, it is not as much "like" yours as I think it is. :slight_smile:
I have yet to modify yours to include too much as you my query does, but
I will keep trying.

My 1st question is:
Where is it defined that a scope can be defined on a subquery?
I thought scopes where about saving a filter result? Am I getting scope
confused with caching?

I am also not sure I have a users model of how your query works

When I write a
query: { bool: { must: [ { } , { , scope: "my_scope" }
] } }
I thought it was doing something like:

(1) INNER_JOIN( queryA(), queryB() )
That is, it the results of running queryA and separately (without
consideration of queryA) the results of running queryB are generated.
Only then are the two result sets joined together.

Thus I couldn't figure out how declaring a scope for queryB would
include queryA and thus why I was scratching my head about the
usefulness of a nested query scope,
but your working example suggests that it is more like.

(2) queryB(queryA())
queryB is applied using the documents returned by the results of
queryA(), thus the results at the end of queryB is by both queryA and
queryB, as you demonstrate.

Is (2) what is really going on or is there another model of how it works?


Re-reading the facets page again, I spotted another place that I was
making an assumption related to scope.
http://www.elasticsearch.org/guide/reference/api/search/facets/index.html
"There�s one important distinction to keep in mind. While search
/queries/ restrict both the returned documents and facet counts, search
/filters/ restrict only returned documents � but /not/ facet counts."
I was assuming that the term "search filters" is the final filter listed
directly within the "search" struct, not any filter mentioned within a
some filtered query (or other query that includes a filter).
For example,
search {
query: {
filtered: {
query: { ... },
filter: { ... } <-- I thought the quote does NOT refer to this.
},
filter: { ... }, <-- I though the quote refers to this
facet: { ... }
}

Am I wrong about that?

Alas, even re-working my query to be a serious of clauses in a bool
query with NO filters didn't do the trick. Regardless, answers to the
above probably would help me understand what direction I should take on
this any other queries.

-Paul
On 11/5/2012 5:22 PM, P. Hill wrote:

On 11/5/2012 12:38 PM, Igor Motov wrote:

Could you take a look at this example https://gist.github.com/4020172
and modify the output so it meets your requirements?
I will take a look at that sometime soon! Thanks, -Paul

--

On 11/7/2012 5:25 AM, Igor Motov wrote:

Yeah I see your point now. The sample that I posted is not a proper
solution for this issue. I will try to come up with a better solution
shortly.

I'm still interested in understanding where a scope comes from, and
understanding the quote from the facets page, but I was able to solve my
problem using
(1) no filters, and
(2) avoiding "terms" query, because I seem to have some
mis-understanding about what it should do (or maybe even it has a bug).

See a complete discussion and working REST examples at:

Thanks to Igor for the 1st cut at an example, it was very close to what
I needed.

-Paul

--

To better understand the quote from the facets page, imagine for a second
all documents in your index, this is the "global" scope. Now, execute
"query" portion of your request against all these documents. You will get a
subset of your index that satisfies your query. This is the "main" scope
and these are the documents that facets are calculated against. Now apply
"filter" portion of you request to this subset, you will get another subset
that satisfies both filter and query portions of your requests. This is
your hits. The number of documents in this subset will be returned in
"total" and top 10 (or whatever number you specified in the "size" field)
documents will be returned in the "hits" object. By "filter" portion, I
mean top level filter object, not the filters that can appear inside query
as part of filtered_query. Does this make sense?

This morning I was also trying to expand this explanation to include scope
of nested facets, but I think I stumbled upon a bug. So, I am still trying
to figure what the correct behavior should be. Basically, the results that
your are going to see in the nested facets depend on the order of execution
of subqueries within the parent query, which doesn't seem right. It might
be easier to show it using an example:
https://gist.github.com/c505e3ce1f535d700f37 Maybe this was the reason why
your complex query didn't work.

On Wednesday, November 7, 2012 9:12:55 PM UTC-5, P Hill wrote:

On 11/7/2012 5:25 AM, Igor Motov wrote:

Yeah I see your point now. The sample that I posted is not a proper
solution for this issue. I will try to come up with a better solution
shortly.

I'm still interested in understanding where a scope comes from, and
understanding the quote from the facets page, but I was able to solve my
problem using
(1) no filters, and
(2) avoiding "terms" query, because I seem to have some
mis-understanding about what it should do (or maybe even it has a bug).

See a complete discussion and working REST examples at:
https://gist.github.com/4035647

Thanks to Igor for the 1st cut at an example, it was very close to what
I needed.

-Paul

--

On 11/7/2012 7:24 PM, Igor Motov wrote:

Basically, the results that your are going to see in the nested facets
depend on the order of execution of subqueries within the parent
query, which doesn't seem right. It might be easier to show it using
an example: https://gist.github.com/c505e3ce1f535d700f37 Maybe this
was the reason why your complex query didn't work.

The scope of the subquery is the result of applying the nested query to
the set of all nested objects from all subqueries on the parent which
makes perfect sense, if it is long chain on applying one function upon
another
nestedScope = a set of some nested types =
aNestedQuery(parentSubQs(allDocuments))

As you said, but it doesn't explain how a series of the "must"s and
"should"s would combine to make parents used to find the nested items.

One answer is to apply all parent Q's then use that as a set of parents
to find a set of nested objects, then using the results of the nested
and save that as a scope. But then what does it mean if two of
sub-queries in a parent query refer to two different nested objects
since each nested query can contribute to scoring and boolean matching?

At that point it may be the only choice to take the parts of a query in
order as found, stopping to retain any declared scope of nested IDs
resulting from a nested Q.

Thinking through this and assuming order does matter gives me ideas to
look back at what I had and see if it was order dependent. Thanks for
that thought.

-Paul

--