I'd like to ask about some test failures and then open an issue if it makes sense.
I'm running the metricbeat unit tests on arm64 and s390x and I'm getting these test failures:
FAIL github.com/elastic/beats/v7/metricbeat/module/etcd/metrics 0.820s
FAIL github.com/elastic/beats/v7/metricbeat/module/kubernetes/proxy 0.206s
FAIL github.com/elastic/beats/v7/metricbeat/module/kubernetes/controllermanager 5.974s
FAIL github.com/elastic/beats/v7/metricbeat/module/kubernetes/scheduler 0.255s
FAIL github.com/elastic/beats/v7/metricbeat/module/kubernetes/apiserver 31.262s
FAIL github.com/elastic/beats/v7/metricbeat/mb/testing/data 68.656s
The failures have the same pattern. The bucket
array is missing values which are zero eg.:
diff -u bucket-values-expected.json bucket-values-actual.json
--- bucket-values-expected.json 2022-12-09 12:05:20.273485393 -0500
+++ bucket-values-actual.json 2022-12-09 12:04:58.043485663 -0500
@@ -31,15 +31,12 @@
"ns": {
"bucket": {
"+Inf": 3,
- "1000000": 0,
"1024000000": 3,
"128000000": 3,
"16000000": 2,
- "2000000": 0,
"2048000000": 3,
"256000000": 3,
"32000000": 2,
- "4000000": 0,
"4096000000": 3,
"512000000": 3,
"64000000": 3,
I've traced the issue back to the prometheus code that the kubernetes module uses. There are some casts from float64
NaN
and Inf
to unit64
and the results are platform dependent in Go. There are a few place where this happens and the code is similar to:
if bucket.GetCumulativeCount() != uint64(math.NaN()) && bucket.GetCumulativeCount() != uint64(math.Inf(0)) { ...save value...}
On amd64, uint64(math.NaN())
is 0x8000000000000000
.
On arm64 and s390x, uint64(math.NaN())
is 0
so buckets with value zero end up getting filtered out.