How to sum across all intermediate aggregate buckets?

davidgAID · September 6, 2022, 9:07pm

I have an index like this (simplified):

{
    "entity_id": {"type": "keyword"},
    "attr1": {
        "properties": {
            "attr1_id": {"type": "keyword"},
            "attr1_type": {"type": "keyword"},
            "attr1_value": {"type": "double"}
        }
    },
    "attr2": {
        "properties": {
            "attr2_id": {"type": "keyword"},
            "attr2_type": {"type": "keyword"},
            "attr2_value": {"type": "double"},
        }
    }
}

It is heavily denormalized. In the source data, the relationship between entity_id and attr1/attr2 is one-to-many. In this Elasticsearch index, if an entity_id has N attr1 and M attr2 there is a total of N * M documents for that entity_id.

I am trying to do a query that sums all of the attr1_value but ignores duplicates. I first tried a query like this:

{
    "query": {...},
    "aggs": {
        "total_value_buckets": {
            "terms": {
                "field": "attr1.attr1_id"
            },
            "aggs": {
                "attr1_value_top": {
                    "max": {"field": "attr1.attr1_value"}
                }
            }
        },
        "total_value": {
            "sum_bucket": {
                "buckets_path": "total_value_buckets>attr1_value_top"
            }
        }
    }
}

The attr1_value_top deduplicates the denormalized copies of attr1. All of the attr1_value fields for a given attr1.attr1_id are identical so I just grab one with max(). And this all appears to be working correctly.

The problem is that the total_value does correctly sum up all of the buckets, it only sums the top 10 buckets that the query happens to return at first. I'd like it to sum up across all the buckets, including the ones that don't get returned.

If it makes it clearer, this would be the equivalent SQL:

SELECT SUM(top_attr1_value) AS total_attr1_value
FROM (
    SELECT attr1_id, MAX(attr1_value) AS top_attr1_value
    FROM the_index
    GROUP BY attr1_id
) intermediate

system · October 4, 2022, 9:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sum of aggregated terms are not per bucket, but total Elasticsearch	2	299	November 16, 2020
How to sum the results of multiple buckets Elasticsearch	2	3676	November 24, 2018
Aggregate counting/sum query Elasticsearch	5	3497	March 3, 2019
Sum bucket aggregation (high cardinality fields) - optimize query Elasticsearch	1	468	June 11, 2019
Top N documents with multiple buckets Elasticsearch	5	5929	July 5, 2017

How to sum across all intermediate aggregate buckets?

Related topics