Access "uncle" aggregation from bucket

Consider I have the following doc _source's

[{"x": "AA", "y": 10},
{"x": "AA", "y": 20},
{"x": "AB", "y": 5},
{"x": "BB", "y": 50},
{"x": "BC", "y": 15},
{"x": "BC", "y": 0},
{"x": "CC", "y": -10}]

I want all docs

  • filtered by {"range": {"y": {"gte": 0}}}
  • grouped by a terms on x
  • with the sum of all y of that bucket
  • with the percentage of said local sum on y over the sum of all y of this search context (all buckets)

I want to get a list of aggregations with a structure similar to this:

{
"sum_by_x": {
  "buckets": [{
    "key": "AA",
    "localSum": {"value": 30},
    "percentage": {"value": 0.3}
  },{
    "key": "AB",
    "localSum": {"value": 5},
    "percentage": {"value": 0.05}
  },{
    "key": "BB",
    "localSum": {"value": 50},
    "percentage": {"value": 0.5}
  },{
    "key": "BC",
    "localSum": {"value": 15},
    "percentage": {"value": 0.15}
  }]
},
"globalSum": {"value": 100}

I've done my share of research on many combinations and structures to try and make that global sum available as a bucket_path to the terms aggregation, but with no success so far.

My base query looks something like this for now (please ignore the boilerplate structure for the query):

{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "y": {
                  "gte": 0
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "sum_by_x": {
      "terms": {
        "field": "x"
      },
      "aggs": {
        "localSum": {
          "sum": {
            "field": "y"
          }
        }
      }
    },
    "globalSum": {
      "sum": {
        "field": "y"
      }
    }
  }
}

Is there any way I can access the globalSum while creating the sum_by_x buckets so I can calculate the percentage of each bucket towards the total?

This is unfortunately not currently possible. It's something we have talked about, would love to add and have even attempted to add but unfortunately it requires some tricky problems in the reduce phase of aggregations to be overcome.

For now the final part of this calculation needs to be done in the client application to obtain figures like the percentages you are after here.

1 Like

The significant_terms aggregation provides some of those "percentage of background" stats but the basis of its accounting is currently volumes of docs, not quantities of values held on those docs.
There is an open issue for this [1] but we don't have any development on it just yet.

[1] https://github.com/elastic/elasticsearch/issues/12309