Term aggregation count too high

srd · June 21, 2017, 12:32pm

I'm using a boolean query with a mix of match, prefix and phrase queries in should, must and filter, and doing a term aggregation over a keyword. The results of the aggregation can be used to add further filters to the query (i.e. a simple faceted search).

As far as I understood aggregations, they should work (per shard) on the result set returned by the query. After going through the "Count is approximate" section of the term aggregation page, I would understand if the doc_count aggregation returned would be too small (since the relevant keyword wouldn't need to be in the buckets returned from a shard). But how can it be too large?

However I have the case where one keyword in the aggregation gives me a doc_count of 2. Adding a filter on this keyword however returns an empty set.

I've tried increasing size as well as the shard_size to a value greater than the number of buckets in total, in order to force an exact count, but the count result persists. doc_count_error_upper_bound is 0 for this aggregation.

Clearly I'm not understanding something about how the term aggregation works. Do aggregations disregard certain query matches or filters when aggregating the results and I'm seeing a case where the 2 counted documents are filtered later on?

Mark_Harwood · June 21, 2017, 12:55pm

You are correct in your assumption that the inaccuracies relate to under-counting never over-counting.
Can you double check the filter that produces zero results? What does that query look like?

srd · June 21, 2017, 2:00pm

After bangig my head against the wall for two hours before posting, I notice on copying out the query text that we have an error in the pagination code and the from parameter was greater than the number of items found in total. After fixing that, everything seems to be working as expected... When you're hip deep in complex stuff thats new to you a trivial bug can completely trip you up. Ah well...

Thank you for confirming my assumptions on the counting of the aggregations though, that helps in another part of the project :).

Mark_Harwood · June 21, 2017, 2:03pm

Some mistakes just want a bigger audience

system · July 19, 2017, 2:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to determine the correct size for terms aggregation, which will produce accurate aggregation results? Elasticsearch	4	384	February 16, 2022
About aggregation Query count Mismatch for lesser Records Elastic Search elastic-site-search	2	156	February 22, 2024
Sum_other_doc_count higher than total docs Elasticsearch	3	1661	August 16, 2018
Aggregations return result with sum_other_doc_count Elasticsearch	3	11410	July 5, 2017
Term aggregation not return accurate number of records Elasticsearch	3	514	March 28, 2018

Term aggregation count too high

Related topics