I have a document like this:
{
"_id": 1,
"factor": 100,
"field_with_high_cardinality": 1000
}
I know that for each unique field_with_high_cardinality the factor is always the same.
I need to calculate sum aggregation of factor field considering only unique (distinct) values of field_with_high_cardinality .
I tried to calculate terms for field_with_high_cardinality , then avg for each term (since all factor fields are same, avg will give me the risk field back). Then sum_bucket on avarages:
"aggs": {
"terms_agg": {
"terms": {
"field": "field_with_high_cardinality",
"size": 1000000
},
"aggs": {
"avg_risks": {
"avg": {
"field": "factor"
}
}
}
},
"sums":{
"sum_bucket": {
"buckets_path": "terms_agg.avg_risks"
}
}
}
But since I have a lot of field_with_high_cardinality values (almost 99% are unique) I get 2 problems:
- Unacceptably long execution time
- All
risk_termsbuckets are returned as part of the response, but I need only thesumsfield.