Get total items in bucket in Elasticsearch

Using the sample data below, I would like to get the total number of users who own at least 2 phones on a histogram.

Let me explain with a simple example:

Create mappings

PUT sales
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  },
  "mappings": {
    "transactions": {
      "properties": {
        "username": {
          "type": "keyword"
        },
        "phone": {
          "type": "keyword"
        },
        "createdOn": {
          "type": "date",
          "format": "yyyy-MM-dd HH:mm:ss"
        }
      }
    }
  }
}

Sample Data:

POST /sales/transactions/_bulk
{ "index":{} }
{"username": "john","phone": "iPhone 5","createdOn": "2018-01-01 11:20:39"}
{ "index":{} }
{"username": "john","phone": "Samsung S4","createdOn": "2018-01-02 11:20:39"}
{ "index":{} }
{"username": "eve","phone": "iPhone X","createdOn": "2018-01-01 11:20:39"}
{ "index":{} }
{"username": "mary","phone": "iPhone 6","createdOn": "2018-01-03 11:20:39"}
{ "index":{} }
{"username": "joe","phone": "iPhone 7","createdOn": "2018-01-05 11:20:39"}
{ "index":{} }
{"username": "luke","phone": "iPhone 8","createdOn": "2018-01-07 11:20:39"}
{ "index":{} }
{"username": "aladin","phone": "Samsung S5","createdOn": "2018-01-08 11:20:39"}
{ "index":{} }
{"username": "aladin","phone": "Samsung S8","createdOn": "2018-01-08 11:20:39"}

We can get a list of users who own at least 2 phones by running the following query:

GET /sales/transactions/_search
{
  "size": 0,
  "query": { "match_all": {} },
  "aggs": {
    "rich_users_over_time": {
      "date_histogram": {
        "field": "createdOn",
        "interval": "month",
        "format": "yyyy-MM",
        "min_doc_count": 1,
        "time_zone": "UTC"
      },
      "aggs": {
        "user": {
          "terms": {
            "size": 10,
            "field": "username",
            "min_doc_count": 2
          }
        }
      }
    }
  }
}

Result:

    "rich_users_over_time": {
      "buckets": [
        {
          "key_as_string": "2018-01",
          "key": 1514764800000,
          "doc_count": 8,
          "user": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "aladin",
                "doc_count": 2
              },
              {
                "key": "john",
                "doc_count": 2
              }
            ]
          }
        }
      ]
    }
  }

I want the total count of users and not the list of each user. I need something like this:

  "aggregations": {
    "rich_users_over_time": {
      "buckets": [
        {
          "key_as_string": "2018-01",
          "key": 1514764800000,
          "doc_count": 4,
          "types_count": {
            "value": 2 <====== 2 users owning atleast 2 phones
          }
        }
      ]
    }
  }

While I could just iterate over the number of items in the bucket to get the total count, this is not a viable solution as I have thousands of terms in the bucket.

I have looked into using cardinality but that does not seem to solve the problem. I have a min_doc_count in my query that needs to match at least 2 docs.

Any help will be highly appreciated.

If you do not want to do the counting yourself on the client side, you can let Elasticsearch do it for you with a Pipeline aggregation. Pipeline aggregations are aggregations that run on the result of another aggregation, rather than on the values in your documents.

In this case you could use the stats_bucket aggregation. This aggregation will return a number of values, including a count. This count is the number of buckets. You could use this to get the number of users that occur at least twice per month:

GET /sales/transactions/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "rich_users_over_time": {
      "date_histogram": {
        "field": "createdOn",
        "interval": "month",
        "format": "yyyy-MM",
        "min_doc_count": 1,
        "time_zone": "UTC"
      },
      "aggs": {
        "user": {
          "terms": {
            "size": 10,
            "field": "username",
            "min_doc_count": 2
          }
        },
        "types_count": {
          "stats_bucket": {
            "buckets_path": "user>_count"
          }
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.