Error fetching data for metricset linux.pageinfo: error reading pagetypeinfo

We are using metricbeat 7.17.3 as of this post, and are receiving an error on several systems which are configured to use the linux.pageinfo metricset, also reported here, but has since auto closed.

May 17 15:38:58 example.host metricbeat[12345]: 2022-05-17T15:38:58.783-0700        ERROR        module/wrapper.go:259        Error fetching data for metricset linux.pageinfo: error reading pagetypeinfo: error parsing zone: : strconv.ParseInt: parsing "": invalid syntax

This appears to be due to the way pagetypeinfo is displayed when over 100000, and the way the module parses each column expecting integers, as seen roughly here in the regex expresion and subsequent parsing of an expected integer.

Example:
>100000 is reported by pagetypeinfo, but is more accurately represented as the total count of 213501 as seen below.

$ cat /proc/pagetypeinfo

Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      1      0      0      0      2      1      1      0      1      0      0
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      1      3
Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type    Unmovable     45    490   2125   1535    908    429    139     10      0      0      0
Node    0, zone    DMA32, type  Reclaimable    554    760   1801   1748   1184    671    239     17      3      0      0
Node    0, zone    DMA32, type      Movable    206  33222  18654   4021    946    186     80     34      6      0      0
Node    0, zone    DMA32, type      Reserve      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type    Unmovable   1687  31115  18021  11061   3504    614     31      1      0      0      0
Node    0, zone   Normal, type  Reclaimable  15289  16293  13341  10028   4203    909     71     13      3      0      0
Node    0, zone   Normal, type      Movable      9 >100000  88777  13664   2611    818    240     14      5      0      0
Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
Node 0, zone      DMA            1            0            7            0            0            0
Node 0, zone    DMA32          165          190         1173            0            0            0
Node 0, zone   Normal         1093          724         4839            0            0            0
Your password will expire in 13 days

$ cat /proc/buddyinfo
Node 0, zone      DMA      1      0      0      0      2      1      1      0      1      1      3
Node 0, zone    DMA32    668  34432  22579   7304   3038   1286    458     61      9      0      0
Node 0, zone   Normal  17254 213501 120140  34753  10318   2341    342     28      8      0      0

I'm also seeing this issue. As mentioned, pagetypeinfo returns more info on page migration types, but less accurate totals than buddyinfo for large quantities. From the kernel code here, you can see pageinfotype truncates any count over 100k and just displays >100000.

The regex for this should be corrected for the pagetypeinfo stats to at least handle the > indication and instead of emulating the stats for buddyinfo, it should be called directly to get proper totals.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.