Hi,
I'm looking to run a SQL-like group by query with ElasticSearch. I have an index where each hit maps to an action with a user id. I would like to create a histogram grouped by the # of times a user completed a specific action. I currently query where actions where type = "foobar" and group by user_id. This produces the example output below, which indicates a bucket keyed on user id with a doc_count equal to the amount of actions that match for that user.
What I'm missing is then taking the key & doc_count result and bucketing into a histogram grouped by the count. For example, with the example output, the histogram bucket would be: [60 => 1, 63 => 2, 105 => 1, 133 => 1]. This indicates there are 2 users who triggered a doc_count of 63, a.k.a. have triggered that action 63 times. I'm missing the second part of this grouping query (maybe a pipeline aggregation?) to group by user action count.
Can someone help me understand where this is possible and whether it can then be used in a Kibana visualization? Thanks! Sean
Example Input
POST events-*/_search?size=0
{
"query": {
"bool": {
"must": [{
"query_string": {
"analyze_wildcard": true,
"query": "action=foobar"
}
}, {
"range": {
"@timestamp": {
"gte": 1486332269,
"lte": 1486937069,
"format": "epoch_second"
}
}
}],
"must_not": []
}
},
"aggs": {
"groupById": {
"terms": {
"field": "user_id.keyword",
"size": 5
}
}
}
}
Example Response:
{
"took": 3711,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"failed": 0
},
"hits": {
"total": 14271080,
"max_score": 0,
"hits": []
},
"aggregations": {
"groupById": {
"doc_count_error_upper_bound": 193,
"sum_other_doc_count": 14270656,
"buckets": [
{
"key": "user_1",
"doc_count": 133
},
{
"key": "user_2",
"doc_count": 105
},
{
"key": "user_3",
"doc_count": 63
},
{
"key": "user_4",
"doc_count": 63
},
{
"key": "user_5",
"doc_count": 60
}
]
}
}
}