I am searching for a way to count the number of buckets matching a certain criterion.
Contrary to Aggregegation buckets count the cardinality aggregation does not work in my case.
An example to make it clearer:
I need the number of values (for a non-analyzed field) that exist only once (alternatively also more than once would be fine, as the number of distinct values is easy to retrieve).
Let's say, my set contains a, a, b, a, c, c, d, e, e => then I want "2" as a result (for b and d are unique)
cardinality would return the distinct count, i.e. 5 (also including a, c and e).
I could do a
terms aggregation with
min_doc_count set to 2 - then I would get 3 buckets (a, c, e), and subtract this count from the cardinality. But that's far too expensive, I neither need nor want the values of these buckets.
The terms aggregation itself only contains
bucket_count can be infered from the result, but there is no
sum_other_bucket_count (in this case I could limit the
size to 1 (as 0 means all) and have only 1 additional bucket with information that I do not need (still better than a million).
For any pipeline aggregation, I also need the aggregation with all the buckets first.
I would also be happy with a query allowing to apply such a filter - then I could use the filter aggregation to get only uniquely appearing hits and count them.
My preferred solution (also for other use cases) would be to have a
Metrics Aggregation returning the number of buckets for an aggregation without returning the values for the buckets.
Thanks in advance!