By default, the
terms
aggregation will return the buckets for the top ten terms ordered by thedoc_count
. One can change this default behaviour by setting thesize
parameter.
If I am doing a search like below, how can I know I get back the "largest" by sum?
For example, I am parsing Cloudfront logs to find bandwidth use by customer. The index.html
might be the most frequent document but only amount to a gig or so while my-large-movie.mp4
would be the largest by total sum but may have thousands less of records
# POST /_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": "2021-07-01T00:00:00.000",
"lt": "2021-08-01T00:00:00.000"
}
}
},
{
"match": {
"type": {
"query": "assets"
}
}
}
]
}
},
"aggs": {
"by_url": {
"terms": {
"field": "cs_uri_stem.keyword",
"size" : 100
},
"aggs": {
"total_bytes": {
"sum": {
"field": "sc_bytes"
}
}
}
}
}
}