Derivative aggregation to convert from counter to rate value

Hi everyone,

We are using collectd to gather NIC stats from multiple hosts and ship them to our ES cluster using logstash. The collected data looks as follows:

{"@timestamp":"2016-05-25T10:00:30.000+0100", "host":"host1", "collectd_type":"if_octets", "rx": 1000}
{"@timestamp":"2016-05-25T10:00:30.000+0100", "host":"host2", "collectd_type":"if_octets", "rx": 2000}
{"@timestamp":"2016-05-25T10:00:40.000+0100", "host":"host1", "collectd_type":"if_octets", "rx": 1000}

Data is received every 10 seconds, one record per host. The rx field is an ever increasing counter representing the total amount of bytes received by the NIC since the machine is alive. Nothing new here for those familiar with collectd...

We'd like to draw a graph representing the bandwidth (bytes per seconds) used per host over time. For this we use an histogram aggregation followed by an a derivative to compute the amount of bytes received during the bucket period. The aggs part of the query looks as follows (only the relevant parts are included):

"aggs": {
    "4": {
      "terms": { "field": "host" },
      "aggs": {
        "2": {
          "date_histogram": {
            "interval": "10s",
            "field": "@timestamp"
          "aggs": {
            "1": { "max": { "field": "rx" } },
            "3": { "derivative": {  "buckets_path": "1", "unit":"second" } }

As I understand, here is what happens:

  • the terms aggregation creates a bucket for each host with matching documents
  • the date_histogram aggregation further splits every host buckets into smaller buckets of 10s duration based on the @timestamp field
  • for each 10s period (and for each host), the max function retains only the maximum rx value found in the time range. Since the value is an increasing counter, we could have used the min as well...
  • finally, the derivative returns the difference of the max value between two consecutive buckets - which is the amount of bytes received during that period.

(stop me here if I'm wrong...)

So far so good. That query gives a breakdown of the used bandwidth per host. Now the question: how should I write the query to have the total bandwidth of all hosts ?

Removing the termsaggregation on the host field is not correct as it will break the max and derivative portion responsible to convert the counter value into a rate. As far as I understand, I should be able to build something like:

  • create buckets from the date_histogram
  • split each bucket per host
  • for each host, find the maximum value (or the last based on timestamp)
  • compute derivative for each host, per bucket to get the amount of bytes for the time period
  • then sum the total bytes of each host

Unfortunately, I have been unable to construct such query.
Any help would be appreciated.


I think you are after this one -