Hi!
I have documents in elastic like:
{
"email": "user@example.com",
"subject": "Email subject",
"body": "Email body"
}
And I'm trying to count all unique emails that have more than 10 documents.
I can retrieve all such emails by the aggregation query:
GET emails/_search
{
"size": 0,
"aggs": {
"email_with_more_than_10_docs": {
"terms": {
"field": "email",
"min_doc_count": 10
}
}
}
}
And I get the output:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 74,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"email_with_more_than_10_docs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "user1@example.com",
"doc_count" : 40
},
{
"key" : "user2@example.com",
"doc_count" : 14
},
{
"key" : "user3@example.com",
"doc_count" : 13
}
]
}
}
}
But how can I get the number of the unique emails (number of buckets)? In the example of output above that number is 3.
Tomo_M
(Tomohiro Mitani)
June 20, 2022, 6:50am
2
As long as the size
parameter of aggregation is larger than the number of emails which meet the criteria, count the length of the buckets
array on the client side is a viable option to implement.
If you really need the count in the response, you may use stats_bucket pipeline aggregation. Keep in mind set "size" larger than the number of terms. The "count
" value is what you want.
GET kibana_sample_data_flights/_search?filter_path=aggregations.stats
{
"size":0,
"aggs": {
"dest": {
"terms": {
"field": "DestAirportID",
"size": 10
}
},
"stats":{
"stats_bucket": {
"buckets_path": "dest>_count"
}
}
}
}
{
"aggregations" : {
"stats" : {
"count" : 10,
"min" : 305.0,
"max" : 691.0,
"avg" : 416.1,
"sum" : 4161.0
}
}
}
GET kibana_sample_data_flights/_search?filter_path=aggregations.stats
{
"size":0,
"aggs": {
"dest": {
"terms": {
"field": "DestAirportID",
"size": 1000
}
},
"stats":{
"stats_bucket": {
"buckets_path": "dest>_count"
}
}
}
}
{
"aggregations" : {
"stats" : {
"count" : 156,
"min" : 1.0,
"max" : 691.0,
"avg" : 83.71153846153847,
"sum" : 13059.0
}
}
}
1 Like
But what if there are a lot of documents in elstic? Is there any other solution that takes it into account?
Tomo_M
(Tomohiro Mitani)
July 13, 2022, 8:58am
4
For pagination of aggregation results, Composite aggregation may help you.
Using transform to perform aggregation beforehand might be another option.
system
(system)
Closed
August 10, 2022, 8:58am
5
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.