System.diskio.iostat.await value bump too high

Hi guys, I'm using the Metricbeat module version 7.2.0 to collect system KPI, but the system.diskio.iostat.await value is so wrong, it returns the value too large to be right.
And I make sure the system has nothing wrong. I use another tool is Munin to monitor the system and use it to compare with Metricbeat module.
Also, I can't reproduce this issue, it's happen 2 times already, the first one is when the system run about 2 days and the second is when the system run nearly a week.
So I wonder if anyone have met this issue, could you please help me?
Thanks.


Screenshot from 2020-03-19 16-36-33

Could you please share your configuration formatted using </> and the debug logs (./metricbeat -e -d "*")?

Thanks for your response. Here is my configuration:

    metricbeat.modules:
    - module: system
      metricsets:
        - cpu
        - load
        - core
        - diskio
        - filesystem
        - fsstat
        - memory
        - network
        - process
      enabled: true
      period: 300s
      processes: ['.*']
    - module: mysql
      metricsets: ["status"]
      hosts: ["tcp(127.0.0.1:3306)/"]
      period: 300s
      username: root
      password:
    output.elasticsearch:
      hosts: ["192.168.81.81:9200"]
      index: "metricbeat-%{[agent.version]}-%{+yyyy.MM.dd}"
    logging.to_syslog: false
    logging.metrics.enabled: false
    logging.level: info
    logging.to_files: true
    logging.files:
      path: /var/log/metricbeat
      name: metricbeat
      keepfiles: 7
      permissions: 0644
    setup.ilm.enabled: false
    setup.template.name: "metricbeat"
    setup.template.pattern: "metricbeat-*"

As you can see I have turned off all the log, and the system still on running a few things so I don't have any logs to give you. Sorry about that.

Thanks.

@HenryDuong A few more things,

  1. What OS/distribution is this?
  2. Can you upgrade to 7.6 and see if the problem changes?
  3. Can you paste the output of /proc/diskstats ?

Hi @Alex_Kristiansen , thank for your supports,

  1. I'm using Ubuntu 16.04.6 LTS, kernel 4.4.0-174-generic
  2. I need to find the root cause of it to convince our team to upgrade the version, since we want to find a most stable version to help us monitor the performance system. And from the latest test, we decide to using 7.2.0
  3. Here is my output:
cat /proc/diskstats
   7       0 loop0 0 0 0 0 0 0 0 0 0 0 0
   7       1 loop1 0 0 0 0 0 0 0 0 0 0 0
   7       2 loop2 0 0 0 0 0 0 0 0 0 0 0
   7       3 loop3 0 0 0 0 0 0 0 0 0 0 0
   7       4 loop4 0 0 0 0 0 0 0 0 0 0 0
   7       5 loop5 0 0 0 0 0 0 0 0 0 0 0
   7       6 loop6 0 0 0 0 0 0 0 0 0 0 0
   7       7 loop7 0 0 0 0 0 0 0 0 0 0 0
   8       0 sda 21504490 249229 5508021615 20115976 32207954 17988013 4206576925 62496376 0 14271508 82566560
   8       1 sda1 1803 1305 22100 508 2 0 2 0 0 80 508
   8       2 sda2 24296 0 48592 4060 0 0 0 0 0 4052 4052
   8       5 sda5 173 0 1408 76 3 0 18 0 0 76 76
   8       6 sda6 198630 1029 7354266 81100 9108659 9909536 428513232 9884756 0 1564820 9964056
   8       7 sda7 24887 0 826808 28368 0 0 0 0 0 28308 28336
   8       8 sda8 10111801 9027 2743105954 16069360 19629918 5728349 851597352 16256784 0 8732424 32309184
   8       9 sda9 11043513 32 2753935023 3892996 3415361 1142797 2916375553 35342600 0 6983092 39206404
   8      10 sda10 99170 237836 2711912 39432 54011 1207331 10090768 1012236 0 214824 1051496
   8      16 sdb 90186669 3 734075562 26217156 2578759732 1362378882 32264807568 1747002560 0 76967536 1772556028
   8      17 sdb1 90186639 3 734073442 26217152 2578759732 1362378882 32264807568 1747002560 0 76971288 1772131572
   8      32 sdc 90176023 2 734764810 26063380 2578595714 1355740799 32210418960 1693896760 0 76883188 1719520868
   8      33 sdc1 90175994 2 734762722 26063364 2578595714 1355740799 32210418960 1693896760 0 76886688 1718596464

Hope you can find any clues. Sorry that I have turn off all the log before.

Thank you

@HenryDuong Sorry for the delay. This is a bit of a difficult problem to trace. Are the "bad" Elasticsearch entries intermittent? Do them seem to happen at regular intervals? Also, could you paste some of the bad ES entries in their entirety, in raw JSON form?