Sum aggregation on field by distinct value of another field with high cardinality

artur · December 2, 2019, 5:51am

I have a document like this:

{
    "_id": 1, 
    "factor": 100,
    "field_with_high_cardinality": 1000
}

I know that for each unique field_with_high_cardinality the factor is always the same.

I need to calculate sum aggregation of factor field considering only unique (distinct) values of field_with_high_cardinality .

I tried to calculate terms for field_with_high_cardinality , then avg for each term (since all factor fields are same, avg will give me the risk field back). Then sum_bucket on avarages:

"aggs": {
    "terms_agg": {
      "terms": {
        "field": "field_with_high_cardinality",
        "size": 1000000
      },
      "aggs": {
        "avg_risks": {
          "avg": {
            "field": "factor"
          }
        }
      }
    },
    "sums":{
      "sum_bucket": {
        "buckets_path": "terms_agg.avg_risks"
      }
    }
}

But since I have a lot of field_with_high_cardinality values (almost 99% are unique) I get 2 problems:

Unacceptably long execution time
All risk_terms buckets are returned as part of the response, but I need only the sums field.

system · December 30, 2019, 5:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sum bucket aggregation (high cardinality fields) - optimize query Elasticsearch	1	468	June 11, 2019
Sum aggregation on distinct values of field Elasticsearch	2	4122	November 24, 2017
Aggregations using distinct query Elasticsearch	3	438	March 27, 2018
Get sum value from distinct aggregation query Elasticsearch	2	2737	August 17, 2017
Sum aggregation on distinct values query Elasticsearch	3	1610	August 1, 2020

Sum aggregation on field by distinct value of another field with high cardinality

Related topics