Return the buckets size after performing aggregations

Hi there!

I'd like to know the size of the buckets array so as to visualize it (as a metric for example).

Let's say I have an index generated as follows:

PUT hockey/player/_bulk?refresh
{"index":{"_id":1}}
{"first":"johnny","last":"gaudreau","goals":[9,27,1],"assists":[17,46,0],"gp": 
[26,82,1],"born":"1993/08/13"}
{"index":{"_id":2}}
{"first":"johnny","last":"monohan","goals":[7,54,26],"assists":[11,26,13],"gp": 
[26,82,82],"born":"1994/10/12"}
{"index":{"_id":3}}
{"first":"johnny","last":"gaudreau","goals":[5,34,36],"assists":[11,62,42],"gp": 
[24,80,79],"born":"1984/01/04"}
{"index":{"_id":4}}
{"first":"micheal","last":"frolik","goals":[4,6,15],"assists":[8,23,15],"gp": 
[26,82,82],"born":"1988/02/17"}
{"index":{"_id":5}}
{"first":"sam","last":"bennett","goals":[5,0,0],"assists":[8,1,0],"gp":[26,1,0],"born":"1996/06/20"}
{"index":{"_id":6}}
{"first":"sam","last":"bennett","goals":[0,26,15],"assists":[11,30,24],"gp": 
[26,81,82],"born":"1983/03/20"}
{"index":{"_id":7}}
{"first":"david","last":"jones","goals":[7,19,5],"assists":[3,17,4],"gp": 
[26,45,34],"born":"1984/08/10"}
{"index":{"_id":8}}
{"first":"tj","last":"brodie","goals":[2,14,7],"assists":[8,42,30],"gp":[26,82,82],"born":"1990/06/07"}
{"index":{"_id":39}}
{"first":"mark","last":"giordano","goals":[6,30,15],"assists":[3,30,24],"gp": 
[26,60,63],"born":"1983/10/03"}
{"index":{"_id":10}}
{"first":"tj","last":"backlund","goals":[3,15,13],"assists":[6,24,18],"gp": 
[26,82,82],"born":"1989/03/17"}
{"index":{"_id":11}}
{"first":"joe","last":"colborne","goals":[3,18,13],"assists":[6,20,24],"gp": 
[26,67,82],"born":"1990/01/30"}
{"index":{"_id":12}}
{"first":"david","last":"jones","goals":[7,19,5],"assists":[3,17,4],"gp": 
[26,45,34],"born":"1984/08/12"}

I tested a nested aggregation which allows me to know which documents have more than one lastname for a same firstname. My query is:

GET hockey/_search
{
  "size" : 0,
  "aggs": {
    "firstname": {
       "terms": {
         "field": "first.keyword"
       },
       "aggs": {
         "lastname": {
           "terms": {
             "field": "last.keyword"
           }
         },
         "minimum_2": {
           "bucket_selector": {
             "buckets_path": {
               "count": "lastname._bucket_count"
             },
             "script": "params.count >= 2"
           }
         }
       }
     }
    }
}

And running it, I got the following:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 12,
    "max_score" : 0.0,
    "hits" : [  ]
  },
  "aggregations" : {
    "firstname" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "johnny",
          "doc_count" : 3,
          "lastname" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "gaudreau",
                "doc_count" : 2
              },
              {
                "key" : "monohan",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "tj",
          "doc_count" : 2,
          "lastname" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "backlund",
                "doc_count" : 1
              },
              {
                "key" : "brodie",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }
}

What I'd like to extract now (and then visualize) is HOW MANY documents of such type are present in my index. Basically, I'd need the size of the "buckets" array (which is 2 in my example).

I tried with sum_bucket and stats_bucket aggs with no success.

There must be an easy way to do it.

Thank you!

I actually managed to obtain the value I wanted as a subfield of aggregations in output, running the following:

 GET hockey/_search
 {
  "size" : 0,
  "aggs": {
    "firstname": {
      "terms": {
        "field": "first.keyword"
      },
      "aggs": {
        "lastname": {
          "terms": {
            "field": "last.keyword"
          }
        },
        "minimum_2": {
          "bucket_selector": {
            "buckets_path": {
              "count": "lastname._bucket_count"
            },
            "script": "params.count>=2"
          }
        },
        "my_count": {
          "cardinality": {
            "field": "first.keyword"
          }
        }
      }
    },
    "bucket_size" : {
      "sum_bucket": {
        "buckets_path": "firstname>my_count"
      }
    }
  }
}

Obtaining the following output:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 12,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "firstname" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "johnny",
          "doc_count" : 3,
          "my_count" : {
            "value" : 1
          },
          "lastname" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "gaudreau",
                "doc_count" : 2
              },
              {
                "key" : "monohan",
                "doc_count" : 1
              }
            ]
          }
        },
        {
          "key" : "tj",
          "doc_count" : 2,
          "my_count" : {
            "value" : 1
          },
          "lastname" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "backlund",
                "doc_count" : 1
              },
              {
                "key" : "brodie",
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    },
    "bucket_size" : {
      "value" : 2.0
    }
  }
}

How can I visualize now the values in bucket_size?

Can't anybody answer this apparently simple question?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.