I have a document like this:
{
"_id": 1,
"factor": 100,
"field_with_high_cardinality": 1000
}
I know that for each unique field_with_high_cardinality
the factor
is always the same.
I need to calculate sum
aggregation of factor
field considering only unique (distinct) values of field_with_high_cardinality
.
I tried to calculate terms
for field_with_high_cardinality
, then avg
for each term
(since all factor
fields are same, avg
will give me the risk
field back). Then sum_bucket
on avarages:
"aggs": {
"terms_agg": {
"terms": {
"field": "field_with_high_cardinality",
"size": 1000000
},
"aggs": {
"avg_risks": {
"avg": {
"field": "factor"
}
}
}
},
"sums":{
"sum_bucket": {
"buckets_path": "terms_agg.avg_risks"
}
}
}
But since I have a lot of field_with_high_cardinality
values (almost 99% are unique) I get 2 problems:
- Unacceptably long execution time
- All
risk_terms
buckets are returned as part of the response, but I need only thesums
field.