I'm trying to get produce the distribution of documents that matches vs
don't match a query, and get the cardinality of a field for both sets. The
idea is "Users who did" vs "Users who did not". In reality I'm actually
running another aggregation under "did not" (otherwise I'd just subtract
one count from the total), but the query here illustrates the issue I'm
having:
I would expect (aggregations.has_thing.dictinct_count.value +
aggregations.does_not_have_thing.distinct_count.value) to be close to
aggreations.total_distinct_count.value, but in reality it's pretty far off
(~+20%). Note: That the summation of doc_count adds up exactly to
hits.total. So I don't think this is an issue with the query, but I could
be wrong.
Any ideas whats up? Have I structured the query incorrectly, Is this a bug?
Or is this just expected behavior?
Some notes:
UserId's data type is a *long, *but the values only fill up integer
space. (510,539 to 418,346,844)
I'm running elasticsearch 1.1.0
I've tried playing around with the precision threshold, but it doesn't
appear to make a difference.
distinct_countOn Thu, May 22, 2014 at 10:34 PM, Phil Price < philprice@gmail.com> wrote:
I would expect (aggregations.has_thing.dictinct_count.value +
aggregations.does_not_have_thing.distinct_count.value) to be close to
aggreations.total_distinct_count.value, but in reality it's pretty far off
I think this result is to be expected if you have some user IDs that match
both criteria? Eg. if your index has these two documents:
Doh! You are correct, my bad. I assumed the filter was an exclusive "per
user" property, but in fact - it is not.
Thanks for getting back to me
Cheers
Phil
On Thursday, May 22, 2014 4:36:02 PM UTC-7, Adrien Grand wrote:
distinct_countOn Thu, May 22, 2014 at 10:34 PM, Phil Price < phil...@gmail.com <javascript:>> wrote:
I would expect (aggregations.has_thing.dictinct_count.value +
aggregations.does_not_have_thing.distinct_count.value) to be close to
aggreations.total_distinct_count.value, but in reality it's pretty far off
I think this result is to be expected if you have some user IDs that match
both criteria? Eg. if your index has these two documents:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.