Metricbeat Kibana dashboards - "Overview" - intermittent results

Kieren_Johnstone · August 21, 2018, 7:25pm

I'd love some assistance with the below issues, is there anything I can do to diagnose?

Issue 1

I'm looking at a fresh "[Metricbeat Kubernetes] - Overview" dashboard.

The top-left stats (Nodes, Deployments, Desired Pods, Available Pods, Unavailable Pods) intermittently show. I left it refreshing on a 5 sec auto-refresh and here's some results - whether correct values were shown (y) or zeroes (n):

nnnynynynnnynynyn

This is when set to "Last 15 minutes".

"Last 30 minutes" seems to show values for Nodes and Deployments , but the pod counters are usually (75% of the time?) showing zero.

The same happens on other metricbeat dashboards for other stats.

Issue 2

Same dashboard. Intermittently, I get this:

Issue 3

"[Metricbeat Docker] Overview" dashboard now:

If I hide the legend, I see the data, without knowing what it relates to. If I show the legend, the data is distorted and out of bounds.

Issue 4

The top-right area, same dashboard:

Issue 5
"[Metricbeat System] Containers overview" dashboard, minor issue, but the links are cut off and there are scrollbars:

Issue 6
Probably just me, but... billions of nanocores? Could I ask why this is?

ruflin · August 24, 2018, 9:51am

I'm not sure I can follow your Issue 1: Do you also share a screenshot here?

In general it seems part of the issues you are reporting is based on how Kibana works and for example scroll bars can appear based on screen size or browser. Agree not nice and we are trying to fix such things.

Could you share your Metricbeat and Kibana version and optional your Metricbeat config file?

Note: I move your topic to the Metricbeat category.

Kieren_Johnstone · August 24, 2018, 10:18am

Hi @ruflin uflin,

Thanks for the response Indeed I figured that some of those issues are CSS/layout/etc, but they are pretty annoying to work around, although the data is sound.

Here's a screen recording:

https://kierenj.tinytake.com/sf/Mjg2MzU5MV84NTk0ODk2

Does that make sense?

Oops, sorry - it's all v6.3.2. Kube Manifest (includes config): https://gist.github.com/kierenj/61293824a7a35515bdb6fda3b2a69f91

shaunak · August 27, 2018, 3:40pm

Hi @Kieren_Johnstone,

Thanks for the screen recording. I think the zero/non-zero issue (issue 1) that you are seeing is due to the refresh interval in kibana being every 5s while beats is reporting data every 10s (as configured via this setting: https://gist.github.com/kierenj/61293824a7a35515bdb6fda3b2a69f91#file-metricbeat-manifest-yml-L191).

As a test, can you try to change the refresh interval in kibana to 10s and see if the zero/non-zero issue goes away?

Thanks,

Shaunak

Kieren_Johnstone · August 27, 2018, 8:36pm

Hi, humm OK that would be surprising to me, I thought "last 15 minutes" means it gets data from now-15 minutes, regardless of refresh frequency? Either way, setting to a 10sec refresh interval doesn't fix it: in fact it seems to show 0 more often. No luck!

simianhacker · August 27, 2018, 9:48pm

The metric visualizations for number of containers (etc) show the last bucket of data in a date histogram (not the total for the last 15 minutes). Unfortunately there is no way to guarantee that the last bucket of data it contains valid data, with in the team we refer to this is a partial bucket problem. Kibana is making a request for data that's not complete yet. There is a pending PR open that will fix this problem. With that PR in place we need to change the dashboard to do the calculation on the last minute of data instead of the last 30 seconds determined by the auto bucketing. @shaunak was correct about the issue but the fix is more complex.

Kieren_Johnstone · August 28, 2018, 7:27am

Ah I see, that makes sense, thanks.

I don't suppose you'd know what's going on for "Issue 2" (seemingly, all data is grouped into the first timestamp histogram bucket for that one)?

Also: I see on the PR it's maybe down for 6.6 or similar. Am I just incredibly unlucky to have this hitting 50% of the time, is there some config change I can make to improve my situation, or are there lots of users with unusable Kube metricbeat dashboards for the next... few months/year? I'd guess I'm just experiencing it pretty intensely, but I'm not sure why?

shaunak · August 28, 2018, 3:19pm

I looked into the code for this visualization and it looks like there is a derivative involved. When this happens, there can sometimes be a "spike" in the visualization. I think this is what is going on here. @simianhacker can you confirm?

simianhacker · August 28, 2018, 3:23pm

@shaunak @Kieren_Johnstone Yes... that PR I mentioned also trims off that part. It's a result of a partial bucket on the beginning part of the data too.

Kieren_Johnstone · August 28, 2018, 7:35pm

Ah, fantastic, thanks. Well, I mean, fantastic to know what it is! I am curious though, why I see this, and presumably the vast majority of people do not? Are there settings I can tweak to improve my situation without the PR ?

Kieren_Johnstone · September 3, 2018, 9:11am

@simianhacker Sorry to @ you, but I'm really keen on getting visibility on our cluster.. surely this doesn't affect the majority of people, so might I ask if there's anything I can do to mitigate this problem - say, with my config?

Thank you!

Kieren_Johnstone · September 20, 2018, 6:42am

Can anyone advise? Surely these dashboards must be successfully used by hundreds of people who don't run into this bug?

Kieren_Johnstone · October 15, 2018, 7:45am

@ruflin @shaunak @simianhacker I'm sorry to @ you again - but is there anything I can do at all? Surely many others are using this successfully - is there nothing to be tweaked in the config? Is it really a random bug? Please help!

simianhacker · October 15, 2018, 1:28pm

The best advice I can give you is to go into each visualization and make sure the interval (under panel options) is set to >=1m and make sure drop last bucket is set to yes. Also I would set the dashboard to last 1 hour and then save the time range with the dashboard. I would also set the refresh to a higher value then your collection interval. So if you are collecting metric every 10 seconds then I would set the dashboard refresh to 30s.

If you make the changes above and it still doesn't help them I would try and increase the interval to maybe show data for the last 5 minutes (>=5m) or tweak that to something reasonable for your setup. Issue you're seeing revolves around the delivery of the data and querying the data.

Kieren_Johnstone · October 16, 2018, 7:26am

Thanks again. I've tried those things but still have the same outcome. In terms of the last paragraph, did you mean the Interval under Panel Options? If I set it to 5m, nothing seems to change - it's still intermittent.

So to be clear, if I look at the "Last 1 hour", set "Drop Last Bucket=yes", and interval to "5m" under Panel Options, this still happens.

I've checked and the latency of metricbeat and filebeat data from the (kubernetes) cluster seems to be under 2 seconds. Is this definitely what's going on? Anything else I can try?

system · November 13, 2018, 9:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Metricbeat "System Overview" dashboards in Kibana are blank Beats metricbeat	8	2401	July 3, 2021
Metricbeats dashboards for Kubernetes controller manager Beats metricbeat	7	475	June 29, 2020
Metricbeat: Kibana Dashboard with (almost) no data Beats metricbeat	7	1079	May 8, 2020
Problem with visualization in [Metricbeat System] Host overview Kibana	3	408	July 10, 2018
Intermittant Metricbeat data on K8s Beats metricbeat	2	292	July 25, 2018

Metricbeat Kibana dashboards - "Overview" - intermittent results

Related topics