All numbers reported by Metricbeat too high (bug?)


(Philipp Krüger) #1

I was wondering why the numbers reported for free disk space were way too high and went to investigate. Switched on debug log and found this in the log:

2016-07-21T14:33:34+02:00 DBG  Publish: {
  "@timestamp": "2016-07-21T12:33:34.783Z",
  "beat": {
    "hostname": "CCT-WEB008-C7V",
    "name": "CCT-WEB008-C7V"
  },
  "metricset": {
    "module": "system",
    "name": "fsstat",
    "rtt": 379
  },
  "system": {
    "fsstat": {
      "count": 32,
      "total_files": 1410425,
      "total_size": {
        "free": 23922294784,
        "total": 35769786368,
        "used": 11847491584
      }
    }
  },
  "type": "metricsets"
}

Compared to the correct results obtained with df:

# df -h
Dateisystem    Größe Benutzt Verf. Verw% Eingehängt auf
/dev/sda3        16G    5,5G  8,4G   40% /
devtmpfs        483M       0  483M    0% /dev
tmpfs           492M       0  492M    0% /dev/shm
tmpfs           492M     19M  473M    4% /run
tmpfs           492M       0  492M    0% /sys/fs/cgroup
/dev/sda2       494M    210M  285M   43% /boot
/dev/sda1       200M    9,5M  191M    5% /boot/efi
tmpfs            99M       0   99M    0% /run/user/0

Compare free disk space of ~8.4 GB with reported free disk space of 23922294784 and total disk space of ~16G with reported total disk space of 35769786368.

I'm running the 5.0.0-alpha4 versions of ELK+beats.

Any ideas what could be wrong?


(Andrew Kroh) #2

I did a comparison of the values reported and did not find any differences. Here's the data I collected: https://docs.google.com/spreadsheets/d/1VOfpgccSSxbnbWtUlvjotA-2le6fLPhDM1C97NU-6c4/pubhtml

It looks like you are comparing verf. to system.fsstat.free which are not the same. Metricbeat's fsstats does not report available, but I think it should (care to file an issue or open a PR to fix it?). If you sum the system.filesystem.avail values they will equal 8.4G.

See http://linux.die.net/man/2/statfs for a low level answer to the difference between "free" and "available".


(Philipp Krüger) #3

I will look into it re PR if I have time. Any idea about "total" and "used" sizes which are seemingly wrong as well? The VMs disk is only 16GB in total, how can fsstat report 35GB total nad 11GB used?


(Andrew Kroh) #4

Try doing the same analysis that I did. I expect you'll be able to better determine the source of the error or bug. Compare the output of df --total against one set of samples from the fsstats and filesystem metricsets.

What OS is this?


(Philipp Krüger) #5

Ok, these are the results from my little analysis. Seems metricbeat is counting a number of spurious filesystems...

Spreadsheet of results

I'm on CentOS Linux release 7.2.1511 (Core)


(Andrew Kroh) #6

Thanks for the information. Looks like the mount table contains entries for rootfs and /dev/sda3 that are identical. It seems that df ignores the rootfs entry.

Where is the rootfs device coming from? Is that LVM? Why does df ignore that device? At this point I have more questions than answers. I need to do some reading.


System.fsstat filtering / processors
(system) #7

This topic was automatically closed after 21 days. New replies are no longer allowed.


(Monica Sarbu) #8

I did a bit of research, and it looks like df doesn't always list all the mounted file systems. Here is an example where df doesn't list all NFS mounted file system, and it only works with df -a.