How does elasticsearch return buckets when aggs by sum is used?

By default, the terms aggregation will return the buckets for the top ten terms ordered by the doc_count . One can change this default behaviour by setting the size parameter.

If I am doing a search like below, how can I know I get back the "largest" by sum?

For example, I am parsing Cloudfront logs to find bandwidth use by customer. The index.html might be the most frequent document but only amount to a gig or so while my-large-movie.mp4 would be the largest by total sum but may have thousands less of records

# POST /_search
{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "@timestamp": {
                            "gte": "2021-07-01T00:00:00.000",
                            "lt": "2021-08-01T00:00:00.000"
                        }
                    }
                },
                {
                    "match": {
                        "type": {
                            "query": "assets"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "by_url": {            
            "terms": {
                "field": "cs_uri_stem.keyword",
                "size" : 100
            },
            "aggs": {
                "total_bytes": {
                    "sum": {
                        "field": "sc_bytes"
                    }
                }
            }
        }
    }
}

See the order parameter in the terms aggregation docs and the example for "Ordering the buckets by single value metrics sub-aggregation"

wow, I am blind, right on the page I linked.

Thanks and sorry :frowning:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.