System.diskio obtaining NaN (Failed to serialize the event: unsupported float value: NaN)

niteman · November 15, 2018, 1:37pm

We're just setting up metricbeat and getting in a bunch of machines communication errors with logstash:

|2018-11-15T12:01:59.051+0100|INFO|pipeline/output.go:95|Connecting to backoff(async(tcp://anonDEST_HOST:anonDEST_PORT))|
|---|---|---|---|
|2018-11-15T12:01:59.088+0100|INFO|pipeline/output.go:105|Connection to backoff(async(tcp://anonDEST_HOST:anonDEST_PORT)) established|
|2018-11-15T12:01:59.130+0100|ERROR|logstash/async.go:256|Failed to publish events caused by: read tcp anonSRC_HOST:anonSRC_PORT->anonDEST_HOST:anonDEST_PORT: use of closed network connection|
|2018-11-15T12:01:59.130+0100|ERROR|logstash/async.go:256|Failed to publish events caused by: unsupported float value: NaN|

We got that with versions 6.4.2, 6.4.3 and 6.5.0

Tracing the communication errors we've found they are caused by system.diskio (disabling it makes everything work smooth)

Drilling into the issue, when using file output and only system.diskio with logging.level debug we only get in the logs:

2018-11-15T14:32:54.612+0100    WARN    fileout/file.go:127     Failed to serialize the event: unsupported float value: NaN
2018-11-15T14:32:54.612+0100    WARN    fileout/file.go:127     Failed to serialize the event: unsupported float value: NaN

Despite of the error some events get logged at the file

jsoriano · November 15, 2018, 2:53pm

Hi @niteman and welcome

This looks like a bug, could you provide more details about your scenario so we can try to reproduce it? It could be specially useful to know the operating system and the configuration for diskio metricset. For this case it can be also relevant to know if you use some special storage system in this host.

Thanks for reporting!

josemaia · November 15, 2018, 3:33pm

Reporting the same issue.

My configuration for diskio is as vanilla as it gets:

- module: system
  metricsets:
    # CPU stats
    - cpu
    (...)
    # IO stats
    - diskio
   (...)
  enabled: true
  period: 10s
  processes: ['.*']
  cpu.metrics: ["percentages", "normalized_percentages"]

running processors:
processors:
- add_docker_metadata: ~
- add_cloud_metadata: ~
- add_locale: ~

I observed the issue on 7 different machines I created over the last couple of days and added monitoring to today. They are all Ubuntu 16.04 LTS Azure VMs, created with Canonical's images.

Observed on Metricbeat 6.5.0, with Logstash at 6.4.2 and 6.5.0 (i upgraded to see if it'd fix the bug).

niclas · November 16, 2018, 4:03am

If it helps, I am seeing the same issue on fresh install of metricbeat with Elasticsearch as output with both products on 6.5 which was working on 6.4 with same metricbeat.yml on client.

Commenting out diskio from metricbeat.yml removes the error from log.

ERROR elasticsearch/client.go:374 Failed to encode event: unsupported float value: NaN

OS on Client and Server is Debian 9.5 Stretch fully patched running on Cloudstack VM's (KVM).

niteman · November 16, 2018, 8:36am

@jsoriano details:

New package instal
Server details: Bate metal with hardware RAID controller
Versions:

# cat /etc/debian_version 
8.10
# metricbeat version
metricbeat version 6.5.0 (amd64), libbeat 6.5.0 [ff5b9b3db49856a25b5eda133b6997f2157a4910 built 2018-11-09 18:03:04 +0000 UTC]

Config:

---
name: anon_host
fields_under_root: false
logging:
  level: debug
  files:
    keepfiles: 7
    name: metricbeat
    path: "/var/log/metricbeat"
    rotateeverybytes: '10485760'
  metrics:
    enabled: false
    period: 30s 
  selectors: :undef
  to_files: true
  to_syslog: false
queue:
  mem:
    events: 4096
    flush:
      min_events: 0
      timeout: 0s
metricbeat:
  modules:
  - module: system
    metricsets:
    - diskio
    enabled: true
    period: 10s 
output:
  file:
    enabled: true
    path: "/tmp/metricbeat"

jsoriano · November 16, 2018, 11:39am

Thanks all for the details, I have opened a new issue https://github.com/elastic/beats/issues/9124

swtrux · November 16, 2018, 12:54pm

I'm seeing the same thing on RHEL 7.3 after installing 6.5...

niclas · November 17, 2018, 3:54am

As a workaround to get metricbeat back working I cloned the 6.5 branch and built with the changes in Cherry-pick #9125 to 6.5: Fix division by zero on diskio metricset #9137 to diskstat_linux.go and I can confirm that the issue is gone.

metricbeat version 6.5.1 (amd64), libbeat 6.5.1 [1e936aef2a0bdca6e8709b3624aa1c9dbac102b2 built 2018-11-17 03:13:22 +0000 UTC]

John_Morrison · November 21, 2018, 3:37pm

diskio is not reporting well due to this, and still having same log error with metricbeat-6.5.1-1

niclas · November 21, 2018, 9:35pm

From the release notes, it looks like issue #9124 was not included in the 6.5.1 release that is now in the deb repository. My recompile above that got it back working was by editing the source myself before building with the changes to diskstat_linux.go on github.

system · December 19, 2018, 9:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failed to publish events caused by: unsupported float value: NaN Beats metricbeat	5	1465	August 23, 2018
Failed to publish events: unsupported float value: NaN Beats metricbeat	16	2137	April 15, 2019
Failed to encode event: unsupported float value: NaN Beats metricbeat	8	1523	March 21, 2019
Metricbeat 7.7.1 : Failed to publish events caused by: unsupported float value: NaN Beats metricbeat	2	500	August 10, 2020
5 of 16 shards failed Beats fleet , metricbeat	7	657	March 22, 2019

System.diskio obtaining NaN (Failed to serialize the event: unsupported float value: NaN)

Related topics