System.diskio metrics written sparsely?


(Jeffkirk1) #1

Hey folks, I was wondering if the Metricbeats behavior I'm seeing with system.diskio metrics is the norm or if I might have a configuration issue.

I've deployed Metricbeat on about 2700 servers (almost all of which are KVM-based VMs running on CentOS 7.4 hypervisors) in my data center. I am trying to identify some of these VMs that have activity on their swap partitions so I can allocate more RAM to them.

The way I'm doing this is to measure system.diskio.iostat.read.request.per_sec and write.request.per_sec on the swap partition (defined by default in our kickstart config as /dev/dm-1). This seems like a decent way of detecting the activity, but I noticed that these metrics are written sparsely instead of at the 10 minute intervals I've defined for data collection.

In other words, looking at these results in the Discovery tab histogram, for host A I see 60 hits distributed in clumps across a 24-hour period, but for host B I might see ten or fewer hits.

My filters specify the following:

system.diskio.iostat.re.per_sec.bytes: exists
system.diskio.name: "dm-1"

And of course the beat.hostname = the host I'm interested in.

Is this sparseness of results by design, or is this an error?


(Kaiyan Sheng) #2

Hello, thanks for all the details. Do you see any error messages in the log similar to "Unable to fetch disk information"?


(Kaiyan Sheng) #3

Also, depends on what version of metricbeat you are using, you might be hitting this issue: https://github.com/elastic/beats/issues/9124