How to sum the results of multiple buckets


#1

I have data like:

Name Update_Date Amount
AAA 2001-08-28T16:21:46 60
BBB 2001-08-27T06:11:52 50
CCC 2001-06-06T20:53:12 40
AAA 2001-08-28T19:33:21 30
BBB 2001-08-28T00:20:46 20
CCC 2001-08-30T23:55:51 10

I want to do a request to search for the row with earliest Update_date in each Name group, and sum up the Amount. The required output would be 60+50+40= 150.

I use aggs to divide data into three buckets and use top_hits to get the data with earliest Update_date in each group, but I can not sum up the Amount of three buckets.

Does anyone can help?


SUM top hit aggegration?
(Gordon Brown) #2

Unfortunately, I don't think there's a way to do this using the built-in aggregations. You can, however, get the behavior you want using a Scripted Metric aggregation, rather than a Top Hits aggregation.

First, let's ingest your example documents into Elasticsearch:

POST testindex/_doc/_bulk
{"index": {}}
{"name": "AAA", "date": "2001-08-28T16:21:46", "amount": 60}
{"index": {}}
{"name": "BBB", "date": "2001-08-27T06:11:52", "amount": 50}
{"index": {}}
{"name": "CCC", "date": "2001-06-06T20:53:12", "amount": 40}
{"index": {}}
{"name": "AAA", "date": "2001-08-28T19:33:21", "amount": 30}
{"index": {}}
{"name": "BBB", "date": "2001-08-28T00:20:46", "amount": 20}
{"index": {}}
{"name": "CCC", "date": "2001-08-30T23:55:51", "amount": 10}

Now, we'll use:

  1. A Terms aggregation to bucket by name, just like you did.
  2. Instead of Top Hits, a Scripted Metric aggregation to essentially do a min operation, but keep track of the associated amount value as well. Once we've found the minimum in each bucket by date, we'll return the amount associsated with that date.
  3. A Sum Bucket aggregation to find the overall total.

Here's the resulting query, which I've also uploaded as a Github Gist, which may be easier to read.

GET testindex/_search
{
  "size": 0,
  "aggs": {
    "names": {
      "terms": {
        "field": "name.keyword"
      },
      "aggs": {
        "oldest_amount": {
          "scripted_metric": {
            "map_script": """
              if (state.oldest_date == null 
                  || doc.date.value.isBefore(state.oldest_date)) {
                state.oldest_date = doc.date.value;
                state.amount = doc.amount.value;
              }""",
            "reduce_script": "return states.stream().min(Comparator.comparing(i->i.oldest_date)).map(i->i.amount).get()"
          }
        }
      }
    },
    "total_amount": {
      "sum_bucket": {
        "buckets_path": "names>oldest_amount.value"
      }
    }
  }
}

Note the query is in Kibana Console format, which converts strings inside triple quotes into normal, JSON single-line quotes automatically. If you want to use this via cURL or another standard HTTP client, you'll have to remove the newlines by hand, or copy this into Kibana and use the "Copy as cURL" functionality.

This will return a response similar to this, although I've left out all but the important parts for brevity:

{
  "aggregations": {
    "names": {
      "buckets": [
        {
          "key": "AAA",
          "oldest_amount": {
            "value": 60
          }
        },
        {
          "key": "BBB",
          "oldest_amount": {
            "value": 50
          }
        },
        {
          "key": "CCC",
          "oldest_amount": {
            "value": 40
          }
        }
      ]
    },
    "total_amount": {
      "value": 150
    }
  }
}

And the total sum is in aggregations.total_amount.value as we requested.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.