Return the buckets size after performing aggregations

elk_user188 · March 28, 2019, 1:38pm

Hi there!

I'd like to know the size of the buckets array so as to visualize it (as a metric for example).

Let's say I have an index generated as follows:

PUT hockey/player/_bulk?refresh
{"index":{"_id":1}}
{"first":"johnny","last":"gaudreau","goals":[9,27,1],"assists":[17,46,0],"gp": 
[26,82,1],"born":"1993/08/13"}
{"index":{"_id":2}}
{"first":"johnny","last":"monohan","goals":[7,54,26],"assists":[11,26,13],"gp": 
[26,82,82],"born":"1994/10/12"}
{"index":{"_id":3}}
{"first":"johnny","last":"gaudreau","goals":[5,34,36],"assists":[11,62,42],"gp": 
[24,80,79],"born":"1984/01/04"}
{"index":{"_id":4}}
{"first":"micheal","last":"frolik","goals":[4,6,15],"assists":[8,23,15],"gp": 
[26,82,82],"born":"1988/02/17"}
{"index":{"_id":5}}
{"first":"sam","last":"bennett","goals":[5,0,0],"assists":[8,1,0],"gp":[26,1,0],"born":"1996/06/20"}
{"index":{"_id":6}}
{"first":"sam","last":"bennett","goals":[0,26,15],"assists":[11,30,24],"gp": 
[26,81,82],"born":"1983/03/20"}
{"index":{"_id":7}}
{"first":"david","last":"jones","goals":[7,19,5],"assists":[3,17,4],"gp": 
[26,45,34],"born":"1984/08/10"}
{"index":{"_id":8}}
{"first":"tj","last":"brodie","goals":[2,14,7],"assists":[8,42,30],"gp":[26,82,82],"born":"1990/06/07"}
{"index":{"_id":39}}
{"first":"mark","last":"giordano","goals":[6,30,15],"assists":[3,30,24],"gp": 
[26,60,63],"born":"1983/10/03"}
{"index":{"_id":10}}
{"first":"tj","last":"backlund","goals":[3,15,13],"assists":[6,24,18],"gp": 
[26,82,82],"born":"1989/03/17"}
{"index":{"_id":11}}
{"first":"joe","last":"colborne","goals":[3,18,13],"assists":[6,20,24],"gp": 
[26,67,82],"born":"1990/01/30"}
{"index":{"_id":12}}
{"first":"david","last":"jones","goals":[7,19,5],"assists":[3,17,4],"gp": 
[26,45,34],"born":"1984/08/12"}

I tested a nested aggregation which allows me to know which documents have more than one lastname for a same firstname. My query is:

GET hockey/_search
{
  "size" : 0,
  "aggs": {
    "firstname": {
       "terms": {
         "field": "first.keyword"
       },
       "aggs": {
         "lastname": {
           "terms": {
             "field": "last.keyword"
           }
         },
         "minimum_2": {
           "bucket_selector": {
             "buckets_path": {
               "count": "lastname._bucket_count"
             },
             "script": "params.count >= 2"
           }
         }
       }
     }
    }
}

And running it, I got the following:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 12,
    "max_score" : 0.0,
    "hits" : [  ]
  },
  "aggregations" : {
    "firstname" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "johnny",
          "doc_count" : 3,
          "lastname" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "gaudreau",
                "doc_count" : 2
              },
              {
                "key" : "monohan",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "tj",
          "doc_count" : 2,
          "lastname" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "backlund",
                "doc_count" : 1
              },
              {
                "key" : "brodie",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }
}

What I'd like to extract now (and then visualize) is HOW MANY documents of such type are present in my index. Basically, I'd need the size of the "buckets" array (which is 2 in my example).

I tried with sum_bucket and stats_bucket aggs with no success.

There must be an easy way to do it.

Thank you!

elk_user188 · March 28, 2019, 2:27pm

I actually managed to obtain the value I wanted as a subfield of aggregations in output, running the following:

 GET hockey/_search
 {
  "size" : 0,
  "aggs": {
    "firstname": {
      "terms": {
        "field": "first.keyword"
      },
      "aggs": {
        "lastname": {
          "terms": {
            "field": "last.keyword"
          }
        },
        "minimum_2": {
          "bucket_selector": {
            "buckets_path": {
              "count": "lastname._bucket_count"
            },
            "script": "params.count>=2"
          }
        },
        "my_count": {
          "cardinality": {
            "field": "first.keyword"
          }
        }
      }
    },
    "bucket_size" : {
      "sum_bucket": {
        "buckets_path": "firstname>my_count"
      }
    }
  }
}

Obtaining the following output:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 12,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "firstname" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "johnny",
          "doc_count" : 3,
          "my_count" : {
            "value" : 1
          },
          "lastname" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "gaudreau",
                "doc_count" : 2
              },
              {
                "key" : "monohan",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "tj",
          "doc_count" : 2,
          "my_count" : {
            "value" : 1
          },
          "lastname" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "backlund",
                "doc_count" : 1
              },
              {
                "key" : "brodie",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    },
    "bucket_size" : {
      "value" : 2.0
    }
  }
}

How can I visualize now the values in bucket_size?

elk_user188 · April 1, 2019, 8:22am

Can't anybody answer this apparently simple question?

system · April 29, 2019, 8:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Computing aggregation bucket size in actions Elasticsearch elastic-stack-alerting	3	1708	July 6, 2017
Watcher Condition: Bucket Size Elasticsearch elastic-stack-alerting	3	982	February 2, 2022
Is there any way to get the total number of buckets a aggs generated? Elasticsearch	8	472	July 6, 2017
Aggregegation buckets count Elasticsearch	4	5730	July 6, 2017
Getting bucket size from response Elasticsearch	2	401	March 9, 2021

Return the buckets size after performing aggregations

Related topics