Metricbeat "System Overview" dashboards in Kibana are blank

Hello, I've been trying to figure this out for a while now.. and initially I thought it had to do with my beats agent no matching my ELK version, but I updated everything today and the problem persists.

On the Kibana dashboard for Metricbeat, there is a visualization called "Top Hosts By CPU (Realtime) [Metricbeat System] ECS" which is a bar chart. I have data in the appropriate fields from my metricbeat agents for all hosts, but some of the hosts don't show up on the chart itself. Some will get a % while some get a dash, some have the bar and some don't I can't figure it out because there are no errors or fields missing as far as I can tell.

If I put in like 6 hours back to "now".. the bars populate, but not with accurate values from the recent records. Do I have a timezone problem somewhere?

Go to Discover and look at the

metricbeat-*

And look at the timestamps.

Timestamps are always stored in Elasticsearch as UTC and are displayed in Kibana based on the local timezone based on the browser.

It is also best practice that the severs your a monitoring have correct system time and timezone

Hey Stephen.. thanks so much for the reply.

I did the following:

  • Set NTP service on each of my device
  • Set Kibana to use UTC in Advanced Settings (it was browser)
  • Checked the timestamps in metricbeat-* index

The devices all have the same time in UTC, the @timestamp field is in UTC in the index.. but, the problem still persists.. it's really odd, cuz if you look at the screenshot I posted.. the percentages are there, but the bars don't appear.

Well usually people leave the browser time zone at Browser so it uses local Timezone.

I have seen that missing bar bug issue in the top N visualization before. What version are you using?

It's the empty / dashes that are causing the issue, I think.

Just for fun do a KQL filter in the KQL bar on one of the hosts that has a Top N value like the 14.6% one.

Try the host overview and see what you see.

Hey Stephen.. sorry for the late response.. I'm actually in an Elasticsearch engineering class that's eating up a bit of time :slight_smile:

I did click on the host overview for one with a dash and one without a dash and got some data missing for the one's with a dash as you might expect. What's interesting is that all the hosts with dashes are Windows and the others are *nix boxes.

Is there maybe something wrong with my beats config that's causing this? Nothing has changed recently.. but, this problem started a few versions back. I'm on version 7.13.1 across the board, but this has definitely been happening since before 7.13.0 for sure.

Here's my metricbeat.yml for Windows:

metricbeat.config.modules:

  path: ${path.config}/modules.d/*.yml

  reload.enabled: true

output.elasticsearch:
  hosts: ["10.0.0.5:9200"]

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~
  
metricbeat.modules:

- module: windows
  metricsets: ["service"]
  enabled: true
  period: 60s

- module: windows
  metricsets: [perfmon]
  period: 10s
  perfmon.ignore_non_existent_counters: true
  perfmon.group_measurements_by_instance: true
  perfmon.queries:
  - object: "Process"
    instance: ["svchost*", "conhost*"]
    counters:
    - name: "% Processor Time"
      field: time.processor.pct
      format: "float"
    - name: "Thread Count"
      field: thread_count
    - name: "IO Read Operations/sec"
  - object: "PhysicalDisk"
    field : "disk"
    instance: "*"
    counters:
    - name: "Disk Writes/sec"
    - name: "% Disk Write Time"
      field: "write_time"
      format: "float"

- module: system
  metricsets:
    - cpu
    - memory
    - network
    - process
    - process_summary
    - uptime
    - socket_summary
    - core
    - diskio
    - filesystem
    - fsstat
  enabled: true
  period: 10s
  processes: ['.*']

  cpu.metrics:  ["percentages","normalized_percentages"]
  core.metrics: ["percentages"]

Here's my metricbeat.yml for Linux:

metricbeat.config.modules:

  path: ${path.config}/modules.d/*.yml

  reload.enabled: true

output.elasticsearch:
  hosts: ["10.0.0.5:9200"]

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

metricbeat.modules:

- module: linux
  period: 10s
  metricsets:
    - "pageinfo"
    - "memory"
    - "ksm"
    - "conntrack"
    - "iostat"
  enabled: true

- module: system
  metricsets:
    - cpu
    - load
    - memory
    - network
    - process
    - process_summary
    - uptime
    - socket_summary
    - core
    - diskio
    - filesystem
    - fsstat
    - raid
    - socket
    - service
  enabled: true
  period: 10s
  processes: ['.*']

cpu.metrics:  ["percentages","normalized_percentages"]
core.metrics: ["percentages"]

It looks like the value the first visualization (Top Hosts By CPU (Realtime)) depends on is system.cpu.user.pct some records in the metricbeat-* index have the value, and some don't.. but even the Windows records have the mix...

Very confusing.. even if there is missing data for Windows.. why would that cause the bars to not render?

cool glad you're taking the engineering class

I think you're overthinking it I think it's a bug in the visualization and apologies but I don't really have time to track it down right now. If I remember it had something to do with sorting or something like that.

Make a different visualization it's good practice going to lens make your own.

If I were you I wouldn't get hung up on that create a new visualization learn lens go into tsvb look at it closely that dash isn't worth losing days over. :slight_smile:

In all the beats you have to look at the metricsets within the beats they'll be Network, system process metricsets etc... The CPUs metrics will be in the system ones not in the network. So there's different types of data based on the metricset.

Take a look at the common fields in the metricset.name and you will start to get it.

Heh.. I hear ya on the time thing, I appreciate the leads though..

Interestingly, I added a "1h" value in the "Offset series time by (1m, 1h, 1w, 1d)" of that chart.. and it started working, so I'm going to try and figure out what "Offset series time by (1m, 1h, 1w, 1d)" actually represents and why that restored the visualization. I'm sure that the data is no longer representative of what it was intended to represent now that I've changed the parameters, so I just need to understand what that parameter actually does.. lol

If it's a bug, I might submit a bug report to Git... Doesn't anyone use the supplied dashboards IRL ??

1 Like

Yes but a lot of people customize them.

But please do open a bug if you can make it repeatable.

Pretty sure it has to do with the last bucket and then when you offset then there isn't a last bucket

Again that's the tsvb top N you can do the exact same thing in lens and I don't think it has the bug

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.