Sum network traffic

Hi,

Since this issue has been closed by @jsoriano I'm asking for a solution here.
I was just trying out Metricbeat to see if I can log outgoing traffic in an LXC container. I couldn't specify other option than the interface (e.g., ports) but I let that go. Unfortunately, I couldn't tell the outgoing traffic in a specific time period after enabling the system network metric. I don't need visualizing it, I simply need a sum.
The hard part is, that whenever the container resets, the counter resets. This isn't, of course, Metricbeat's fault, but there isn't any way to query the data from ES where I can say that I need all the "max" values between X and Y, except for the first one, where you need to take the first document and subtract its
value from the first "max".
I was hoping that there will be an optional setting for non-incremental values for those who need this.
I played around a bit with derivative queries but I couldn't get the result I needed.
My question would be if I'm missing something or this is simply impossible with Metricbeats?

Thanks!

Hello @YvorL! I believe we can do sum in Kibana when creating a Visual Builder in visualization.

Hi @Kaiyan_Sheng!
That may be true, but I need an ES query solution since I need to forward the response to an application on a container basis (speaking of hundreds of containers). So, unfortunately, visualizing it won't help my case. :frowning_face:

Not sure I fully understood yet what you need. So the values are counters. If you want to know to total network traffic between point a and b you are interested int he value b-a?

The values are counters, but every time a container restarts (which happens randomly) the counter restarts from 0. So if I'd like to know the SUM of a 15-day interval where the virtual network card's traffic has been restarted 3 times (4th, 9th, 14th), I'd have 4 maximums (given that I can extract those):

  • 4th day maximum (A)
  • 9th day maximum (B)
  • 14th day maximum (C)
  • 15th day maximum (D)

Where I'd need to subtract the first day's first document's value from A, to get the first subperiod's data. then add B and C maximums, and lastly, subtract the first document after C maximum from D to get the last subperiod. After that, I'd get the total for that 15 days. This is for only one container, for a smaller time period and for the example's sake only 3 restarts at given intervals. Unfortunately, the reality is far more complicated :frowning:
If the values in the documents were non-incremental, it'd be sooooo easy.

What should work here is that you first take the derivate of all the buckets and then take the sum of them which should lead to the expected result. What we here on the visualisation side to make sure the result is correct we skip negative values when the values are reset to 0.

Thank you @ruflin for keeping this alive! :slight_smile:
So I tinkered with this again and added the sum bucket aggregation.

"aggs" : {
    "out_bytes_10m" : {
        "date_histogram" : {
    "field": "@timestamp",
    "interval": "1m"
        },
        "aggs": {
            "obytes": {
                "sum": {
                    "field": "system.network.out.bytes"
                }
            },
            "obytes_deriv": {
                "derivative": {
                    "buckets_path": "obytes",
                    "unit": "1m"
                }
            }
        }
    },
    "sum_obytes": {
        "sum_bucket": {
            "buckets_path": "out_bytes_10m>obytes_deriv" 
        }
    }
}

While I get a partially correct value, my issues are the following:

  • I need to set the interval to the same as the metrics. For example, if I want to check a day's traffic where I gather the metrics every minute, I'll get 1440 buckets (which will be irrelevant to me). For a month, it'll be around 43K. For one container. That seems pretty wasteful :frowning:
    If I change the time interval, I'll get really confusing numbers even if I specify units in the aggregation. Also, I'm not sure how to tell the 'Sum bucket aggregation' to use normalized_value instead of value. Though that won't help me either cause the results aren't useful at all. Or as I previously stated, I might be missing something important :slight_smile:
  • When a container restarts, I'll lose the first reported data because derivative won't count the "null" buckets where there isn't any document in that specific time interval. Which wouldn't be a huge issue if I could set the metric to a reasonably low interval (e.g., 1m). But if I'd try to elevate this interval (e.g., 1h) to avoid creating a huge number of buckets (first issue), I'd risking losing important data.

Empty bucket sample:

 ...
  {
    "key_as_string" : "2018-12-01T13:06:00.000Z",
    "key" : 1543669560000,
    "doc_count" : 0,
    "obytes" : {
      "value" : 0.0
    },
    "obytes_deriv" : {
      "value" : null
    }
  },
  {
    "key_as_string" : "2018-12-01T13:06:10.000Z",
    "key" : 1543669570000,
    "doc_count" : 1,
    "obytes" : {
      "value" : 41047.0
    },
    "obytes_deriv" : {
      "value" : null
    }
  },
  {
    "key_as_string" : "2018-12-01T13:06:20.000Z",
    "key" : 1543669580000,
    "doc_count" : 1,
    "obytes" : {
      "value" : 48577.0
    },
    "obytes_deriv" : {
      "value" : 7530.0
    }
  },
  ...

Do you have any idea how to solve this? I'd be happy with either getting simply the sum_obytes bucket's value and keeping the metric interval short or being able to change the query interval so I won't get flooded.

Hi @YvorL Just wanted to let you know that this is still on my list to look into but didn't get to it yet :frowning:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.