We're just setting up metricbeat and getting in a bunch of machines communication errors with logstash:
|2018-11-15T12:01:59.051+0100|INFO|pipeline/output.go:95|Connecting to backoff(async(tcp://anonDEST_HOST:anonDEST_PORT))|
|---|---|---|---|
|2018-11-15T12:01:59.088+0100|INFO|pipeline/output.go:105|Connection to backoff(async(tcp://anonDEST_HOST:anonDEST_PORT)) established|
|2018-11-15T12:01:59.130+0100|ERROR|logstash/async.go:256|Failed to publish events caused by: read tcp anonSRC_HOST:anonSRC_PORT->anonDEST_HOST:anonDEST_PORT: use of closed network connection|
|2018-11-15T12:01:59.130+0100|ERROR|logstash/async.go:256|Failed to publish events caused by: unsupported float value: NaN|
We got that with versions 6.4.2, 6.4.3 and 6.5.0
Tracing the communication errors we've found they are caused by system.diskio (disabling it makes everything work smooth)
Drilling into the issue, when using file output and only system.diskio with logging.level debug we only get in the logs:
2018-11-15T14:32:54.612+0100 WARN fileout/file.go:127 Failed to serialize the event: unsupported float value: NaN
2018-11-15T14:32:54.612+0100 WARN fileout/file.go:127 Failed to serialize the event: unsupported float value: NaN
Despite of the error some events get logged at the file
This looks like a bug, could you provide more details about your scenario so we can try to reproduce it? It could be specially useful to know the operating system and the configuration for diskio metricset. For this case it can be also relevant to know if you use some special storage system in this host.
I observed the issue on 7 different machines I created over the last couple of days and added monitoring to today. They are all Ubuntu 16.04 LTS Azure VMs, created with Canonical's images.
Observed on Metricbeat 6.5.0, with Logstash at 6.4.2 and 6.5.0 (i upgraded to see if it'd fix the bug).
If it helps, I am seeing the same issue on fresh install of metricbeat with Elasticsearch as output with both products on 6.5 which was working on 6.4 with same metricbeat.yml on client.
Commenting out diskio from metricbeat.yml removes the error from log.
ERROR elasticsearch/client.go:374 Failed to encode event: unsupported float value: NaN
OS on Client and Server is Debian 9.5 Stretch fully patched running on Cloudstack VM's (KVM).
As a workaround to get metricbeat back working I cloned the 6.5 branch and built with the changes in Cherry-pick #9125 to 6.5: Fix division by zero on diskio metricset #9137 to diskstat_linux.go and I can confirm that the issue is gone.
metricbeat version 6.5.1 (amd64), libbeat 6.5.1 [1e936aef2a0bdca6e8709b3624aa1c9dbac102b2 built 2018-11-17 03:13:22 +0000 UTC]
From the release notes, it looks like issue #9124 was not included in the 6.5.1 release that is now in the deb repository. My recompile above that got it back working was by editing the source myself before building with the changes to diskstat_linux.go on github.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.