Incorrect facet counts when using parent/child documents


(Mark Birbeck) #1

Hi there,

I'm trying to use a date histogram facet to count the number of activities
that have happened to particular objects between two dates. It works fine
if the actions are stored as nested documents, but the problem with that
technique is that inserting new actions is very slow. However, whilst
storing the documents as distinct documents and making use of _parent makes
it much easier to manage the data, the facet values returned from queries
are incorrect.

The following script illustrates the problem:

https://gist.github.com/1433393

The main search is for articles that have both 'One' in the title and
have actions that took place between two dates. This correctly returns one
article, which in turn has one action. However the facet values are
calculated across all actions for all articles, like this:

{
"facets":{
"created_facet":{
"_type":"date_histogram",
"entries":[

{"time":1321574400000,"count":2,"min":1.0,"max":1.0,"total":2.0,"total_count":2,"mean":1.0},

{"time":1322006400000,"count":1,"min":0.0,"max":0.0,"total":0.0,"total_count":1,"mean":0.0}
]
},
"published_facet":{
"_type":"date_histogram",
"entries":[

{"time":1321574400000,"count":2,"min":0.0,"max":0.0,"total":0.0,"total_count":2,"mean":0.0},

{"time":1322006400000,"count":1,"min":1.0,"max":1.0,"total":1.0,"total_count":1,"mean":1.0}
]
}
}
}

As you can see it's working as if there are three actions -- which there
are across all of the articles.

According to the documentation, this is not the correct behaviour for
facets applied against a query, but it is the correct behaviour for
facets applied against a filter. My guess is that since, according to the
documentation, the 'has_child' query is simply a wrapper around the
'has_child' filter, then I'm experiencing the filter behaviour whether I
want it or not.

Is this something that could be changed in ES, especially since I think my
data layout is probably a fairly common pattern? I'm tempted to raise

Or can anyone think of a different way to structure the query to get what I
want? For example, a facet filter might appear to provide a solution, but
the query expressed in the facet filter seems to apply to the 'scope'
specified in the facet, which in my case is the child document; is there
any way around that?

Regards,

Mark Birbeck


(system) #2