How is the system.network.in.dropped calculated?

javadevmtl · December 18, 2019, 8:34pm

Hi, I have noticed a machine has a "higher" rate of dropped packets vs other machines. This machine is about +1% packet loss vs other machines are way below 1%

I.e:
Machine 1: 14 dropped packets over 200 million
Machine 2: 2 million over 200 Million.

You see "dropped": 2750373. Is this number cumulative over the uptime of the machine? Or is that how many packets where dropped at that particular timestamp?

I run this query:

GET metricbeat-*/_search
{
  "size": 100,
  "_source": ["@timestamp", "system.network.in.dropped", "host.name"], 
    "query": {
        "query_string" : {
            "query" : "metricset.name:network AND host.name:XXXXXX-0001"
        }
    }
}

And I get...

{
  "took": 99,
  "timed_out": false,
  "_shards": {
    "total": 7,
    "successful": 7,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 118220,
    "max_score": 5.0651484,
    "hits": [
      {
        "_index": "metricbeat-6.4.2-2019.12.16",
        "_type": "doc",
        "_id": "Km65DW8Bbyfak3QNTgOk",
        "_score": 5.0651484,
        "_source": {
          "@timestamp": "2019-12-16T08:00:44.724Z",
          "system": {
            "network": {
              "in": {
                "dropped": 0
              }
            }
          },
          "host": {
            "name": "XXXXXX-0001"
          }
        }
      },
      {
        "_index": "metricbeat-6.4.2-2019.12.16",
        "_type": "doc",
        "_id": "K265DW8Bbyfak3QNTgOk",
        "_score": 5.0651484,
        "_source": {
          "@timestamp": "2019-12-16T08:00:44.724Z",
          "system": {
            "network": {
              "in": {
                "dropped": 2750373
              }
            }
          },
          "host": {
            "name": "XXXXXX-0001"
          }
        }
      }
      }
    ]
  }
}

rugenl · December 18, 2019, 10:54pm

Metricbeat probably gets that dropped packet count from the interface statistics (netstat in Linux). In Linux nterface stats are typically since boot, but can be reset in other ways, I think the same applies to Windows. You could look at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-serialdiff-aggregation.html to graph the drops over time.

How a system knows packets are lost is something I haven't reviewed in several years, we talked about it in a Wireshark class, but google can help with that, for example: https://likegeeks.com/fix-packet-loss/

Debugging packet loss is probably a topic for another forum.

simianhacker · December 18, 2019, 11:17pm

@javadevmtl It should be a monotonically increasing number, as packets are going across the wire, the system increments an integer for errors, packets, dropped packets, and bytes. Metricbeat samples this integer and records it to Elasticsearch. To view this as a rate you will need to apply a derivative pipeline aggregation inside a date histogram aggregation. If you need the total number of packets for a specific time period then you will need subtract the min from the max using a bucket script.

Here is an example of sampling the entire time range:

POST metricbeat-*/_search
{
  "size": 0,
  "aggs": {
    "hosts": {
      "terms": {
        "field": "host.name",
        "size": 10
      },
      "aggs": {
        "max": {
          "max": {
            "field": "system.network.out.dropped"
          }
        },
        "min": {
          "min": {
            "field": "system.network.out.dropped"
          }
        },
        "total": {
          "bucket_script": {
            "buckets_path": {
              "min": "min",
              "max": "max"
            },
            "script": "params.max - params.min"
          }
        }
      }
    }
  }
}

You will want to change the query limit this to a specific time range and host. The unfortunate part of bucket_scripts is that you have to run them inside a multi-bucket aggregation like date_histogram or a terms aggregation.

javadevmtl · December 19, 2019, 3:40pm

Hi, thanks I looked at the sample Kibana dashboard that Metricbeat installs and came up with something. I get just about 1 packet lost per second on the input. I can confirm just by running netstat -i every second or so.

From your query, here is what I get which basically just show the behaviour I have noticed...

    {
      "key": "XXXXXX-0001",
      "doc_count": 1374436,
      "min": {
        "value": 0
      },
      "max": {
        "value": 16
      },
      "total": {
        "value": 16
      }
    },
    {
      "key": "XXXXXX-0003",
      "doc_count": 1340805,
      "min": {
        "value": 0
      },
      "max": {
        "value": 57
      },
      "total": {
        "value": 57
      }
    },
    {
      "key": "YYYYYY-0002",
      "doc_count": 1168439,
      "min": {
        "value": 0
      },
      "max": {
        "value": 1208688
      },
      "total": {
        "value": 1208688
      }
    },
    {
      "key": "YYYYYY-0003",
      "doc_count": 1162877,
      "min": {
        "value": 0
      },
      "max": {
        "value": 1398010
      },
      "total": {
        "value": 1398010
      }
    }

system · January 16, 2020, 3:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
In Packetloss dropping packets Beats metricbeat	6	813	October 14, 2020
System-network-in-bytes-dropped - timeframe Beats metricbeat	2	440	December 30, 2019
Packetloss - is this number bad or not? Beats packetbeat	2	1973	February 21, 2018
Calculate difference between current and previous values Beats elastic-stack-monitoring , metricbeat	2	316	May 11, 2022
Dropped_because_of_gaps Beats packetbeat	1	472	August 20, 2019

How is the system.network.in.dropped calculated?

Related topics