Access "uncle" aggregation from bucket

fredericogalvao · September 8, 2016, 4:43pm

Consider I have the following doc _source's

[{"x": "AA", "y": 10},
{"x": "AA", "y": 20},
{"x": "AB", "y": 5},
{"x": "BB", "y": 50},
{"x": "BC", "y": 15},
{"x": "BC", "y": 0},
{"x": "CC", "y": -10}]

I want all docs

filtered by {"range": {"y": {"gte": 0}}}
grouped by a terms on x
with the sum of all y of that bucket
with the percentage of said local sum on y over the sum of all y of this search context (all buckets)

I want to get a list of aggregations with a structure similar to this:

{
"sum_by_x": {
  "buckets": [{
    "key": "AA",
    "localSum": {"value": 30},
    "percentage": {"value": 0.3}
  },{
    "key": "AB",
    "localSum": {"value": 5},
    "percentage": {"value": 0.05}
  },{
    "key": "BB",
    "localSum": {"value": 50},
    "percentage": {"value": 0.5}
  },{
    "key": "BC",
    "localSum": {"value": 15},
    "percentage": {"value": 0.15}
  }]
},
"globalSum": {"value": 100}

I've done my share of research on many combinations and structures to try and make that global sum available as a bucket_path to the terms aggregation, but with no success so far.

My base query looks something like this for now (please ignore the boilerplate structure for the query):

{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "y": {
                  "gte": 0
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "sum_by_x": {
      "terms": {
        "field": "x"
      },
      "aggs": {
        "localSum": {
          "sum": {
            "field": "y"
          }
        }
      }
    },
    "globalSum": {
      "sum": {
        "field": "y"
      }
    }
  }
}

Is there any way I can access the globalSum while creating the sum_by_x buckets so I can calculate the percentage of each bucket towards the total?

colings86 · September 9, 2016, 7:36am

This is unfortunately not currently possible. It's something we have talked about, would love to add and have even attempted to add but unfortunately it requires some tricky problems in the reduce phase of aggregations to be overcome.

For now the final part of this calculation needs to be done in the client application to obtain figures like the percentages you are after here.

Mark_Harwood · September 9, 2016, 8:49am

The significant_terms aggregation provides some of those "percentage of background" stats but the basis of its accounting is currently volumes of docs, not quantities of values held on those docs.
There is an open issue for this [1] but we don't have any development on it just yet.

[1] https://github.com/elastic/elasticsearch/issues/12309

Topic		Replies	Views
Is it possible to calculate percent each bucket value represents of the whole? Elasticsearch	1	534	July 5, 2017
Combining two aggregations to get term percentage Elasticsearch	5	14306	July 6, 2017
Consult a problem about "bucket_script and sum_bucket " Elasticsearch	2	391	December 13, 2018
Sum of aggregated terms are not per bucket, but total Elasticsearch	2	313	November 16, 2020
Aggregation Path Elasticsearch	1	173	September 17, 2023

Access "uncle" aggregation from bucket

Related topics