Using the sample data below, I would like to get the total number of users who own at least 2 phones on a histogram.
Let me explain with a simple example:
Create mappings
PUT sales
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"transactions": {
"properties": {
"username": {
"type": "keyword"
},
"phone": {
"type": "keyword"
},
"createdOn": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
Sample Data:
POST /sales/transactions/_bulk
{ "index":{} }
{"username": "john","phone": "iPhone 5","createdOn": "2018-01-01 11:20:39"}
{ "index":{} }
{"username": "john","phone": "Samsung S4","createdOn": "2018-01-02 11:20:39"}
{ "index":{} }
{"username": "eve","phone": "iPhone X","createdOn": "2018-01-01 11:20:39"}
{ "index":{} }
{"username": "mary","phone": "iPhone 6","createdOn": "2018-01-03 11:20:39"}
{ "index":{} }
{"username": "joe","phone": "iPhone 7","createdOn": "2018-01-05 11:20:39"}
{ "index":{} }
{"username": "luke","phone": "iPhone 8","createdOn": "2018-01-07 11:20:39"}
{ "index":{} }
{"username": "aladin","phone": "Samsung S5","createdOn": "2018-01-08 11:20:39"}
{ "index":{} }
{"username": "aladin","phone": "Samsung S8","createdOn": "2018-01-08 11:20:39"}
We can get a list of users who own at least 2 phones by running the following query:
GET /sales/transactions/_search
{
"size": 0,
"query": { "match_all": {} },
"aggs": {
"rich_users_over_time": {
"date_histogram": {
"field": "createdOn",
"interval": "month",
"format": "yyyy-MM",
"min_doc_count": 1,
"time_zone": "UTC"
},
"aggs": {
"user": {
"terms": {
"size": 10,
"field": "username",
"min_doc_count": 2
}
}
}
}
}
}
Result:
"rich_users_over_time": {
"buckets": [
{
"key_as_string": "2018-01",
"key": 1514764800000,
"doc_count": 8,
"user": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "aladin",
"doc_count": 2
},
{
"key": "john",
"doc_count": 2
}
]
}
}
]
}
}
I want the total count of users and not the list of each user. I need something like this:
"aggregations": {
"rich_users_over_time": {
"buckets": [
{
"key_as_string": "2018-01",
"key": 1514764800000,
"doc_count": 4,
"types_count": {
"value": 2 <====== 2 users owning atleast 2 phones
}
}
]
}
}
While I could just iterate over the number of items in the bucket to get the total count, this is not a viable solution as I have thousands of terms in the bucket.
I have looked into using cardinality but that does not seem to solve the problem. I have a min_doc_count
in my query that needs to match at least 2 docs.
Any help will be highly appreciated.