Sum aggregation question

(Todd Merritt) #1

I'm trying to use the elastic search integration in pcp to load metrics to ES and visualize them via grafana. It's mostly working except for some of the stats that pcp aggregates as associative arrays.
Load averages are stored as

"@instances": [
                {
                  "load": 8,
                  "@id": "1 minute"
                },
                {
                  "load": 8,
                  "@id": "5 minute"
                },
                {
                  "load": 8.1,
                  "@id": "15 minute"
                }
              ],

for instance. When I try to visualize that with the query below though, ES sums all of the 1, 5, and 15 minute values together and returns a single identical value for all three metrics. Is there a way to get ES to sum the three values separately?

{
    "size":0,
    "query":{
        "bool":{
            "filter":[
                {
                    "range":{
                        "@timestamp":{
                            "gte":"1557750800000",
                            "lte":"1557750940001",
                            "format":"epoch_millis"
                        }
                    }
                },
                {
                    "query_string":{
                        "analyze_wildcard":true,
                        "query":"@host-id:cpu1"
                    }
                }
            ]
        }
    },
    "aggs":{
        "3":{
            "terms":{
                "field":"kernel.all.@instances.@id.keyword",
                "size":100,
                "order":{
                    "_term":"desc"
                },
                "min_doc_count":1
            },
            "aggs":{
                "2":{
                    "date_histogram":{
                        "interval":"1m",
                        "field":"@timestamp",
                        "min_doc_count":0,
                        "extended_bounds":{
                            "min":"1557750800000",
                            "max":"1557750940001"
                        },
                        "format":"epoch_millis"
                    },
                    "aggs":{
                        "1":{
                            "sum":{
                                "field":"kernel.all.@instances.load"
                            }
                        }
                    }
                }
            }
        }
    }
(Zachary Tong) #2

So in this case, the issue is the array of objects. In Elasticsearch, arrays of objects are "flattened" into arrays and does not maintain ordering. So the relationship between the load and the ID is lost during the flattening.

This part of the Guide is quite old so some of the syntax might be out-dated, but the explanation of the phenomenon is still valid: https://www.elastic.co/guide/en/elasticsearch/guide/current/complex-core-fields.html#object-arrays

Essentially, you'll need to either use one of the relational fields (nested datatype, or parent/child), or just use different fields like 5_min_load, 15_min_load, etc

(Todd Merritt) #3

Thanks for the confirmation. I ended up writing my own metric agent for pcp that stores them in separate flat values.