Hi, I have noticed a machine has a "higher" rate of dropped packets vs other machines. This machine is about +1% packet loss vs other machines are way below 1%
I.e:
Machine 1: 14 dropped packets over 200 million
Machine 2: 2 million over 200 Million.
You see "dropped": 2750373. Is this number cumulative over the uptime of the machine? Or is that how many packets where dropped at that particular timestamp?
How a system knows packets are lost is something I haven't reviewed in several years, we talked about it in a Wireshark class, but google can help with that, for example: https://likegeeks.com/fix-packet-loss/
Debugging packet loss is probably a topic for another forum.
@javadevmtl It should be a monotonically increasing number, as packets are going across the wire, the system increments an integer for errors, packets, dropped packets, and bytes. Metricbeat samples this integer and records it to Elasticsearch. To view this as a rate you will need to apply a derivative pipeline aggregation inside a date histogram aggregation. If you need the total number of packets for a specific time period then you will need subtract the min from the max using a bucket script.
Here is an example of sampling the entire time range:
You will want to change the query limit this to a specific time range and host. The unfortunate part of bucket_scripts is that you have to run them inside a multi-bucket aggregation like date_histogram or a terms aggregation.
Hi, thanks I looked at the sample Kibana dashboard that Metricbeat installs and came up with something. I get just about 1 packet lost per second on the input. I can confirm just by running netstat -i every second or so.
From your query, here is what I get which basically just show the behaviour I have noticed...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.