Output Event Rates are showing gaps in APM Stack Monitoring page

vamsikrishna_medeti · December 24, 2020, 9:20am

Hi, I am using On premise elastic stack 7.9.0. And recently we are facing outofmemory issue on one of the java agents.
The APM CallTrace objects are taking more than 2GB which is causing the issue.
Also, we are seeing some gaps in the "Output Event Rate" graph in APM Stack monitoring.

May I know the meaning of these breaks in the graph? And any remedy to the APM Agent OutOfMemory?

Igor_Zaytsev · January 15, 2021, 5:15pm

@vamsikrishna_medeti Sorry for the late reply.

If a query/process is very heavy (causing OOM errors), the ES Stack Monitoring plugin will start throttling collection rate thus resulting in gaps on the chart. Couple of things we'll need to figure out first.

During this period when the chart is showing gaps, are there any logs/errors in the ES/Kibana console?
Have you tried identifying the query causing the OOM? (usually the one that is the slowest). You can do this via:

monitoring.elasticsearch.hosts: ["http://localhost:9200"]
monitoring.elasticsearch.logQueries: true
logging.verbose: true

Try running the output rate query independently and see if it occasionally times out (or if the result also has gaps). Be sure to replace your own cluster_uuid:

GET .monitoring-beats-6-*,.monitoring-beats-7-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "cluster_uuid": "Q4FBbFszTj6jnCWhBG0Pgw"
          }
        },
        {
          "range": {
            "beats_stats.timestamp": {
              "gte": "now-1h"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "check": {
      "date_histogram": {
        "field": "beats_stats.timestamp",
        "fixed_interval": "30s"
      },
      "aggs": {
        "metric": {
          "max": {
            "field": "beats_stats.metrics.libbeat.output.events.total"
          }
        },
        "metric_deriv": {
          "derivative": {
            "buckets_path": "event_rate",
            "gap_policy": "skip",
            "unit": "1s"
          }
        },
        "beats_uuids": {
          "terms": {
            "field": "beats_stats.beat.uuid",
            "size": 1
          },
          "aggs": {
            "event_rate_per_beat": {
              "max": {
                "field": "beats_stats.metrics.libbeat.output.events.total"
              }
            }
          }
        },
        "event_rate": {
          "sum_bucket": {
            "buckets_path": "beats_uuids>event_rate_per_beat",
            "gap_policy": "skip"
          }
        }
      }
    }
  }
}

One thing to also try is different time ranges, so instead of the default 1h ago try things like 15m or 6h etc. This way we can figure out if it's a max bucket issue
This might also be because the cluster resources are under provisioned. Have you tried increasing nodes/memory (JVM)?

system · February 12, 2021, 5:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Missing data in APM Dashboards in Kibana APM java , server , ui	15	2792	March 31, 2020
Gaps in monitoring diagrams of cluster Kibana elastic-stack-monitoring	1	276	January 2, 2023
Not able to see APM data in Kibana APM go	9	2573	July 31, 2019
What causes gaps in span APM java	2	1013	February 17, 2020
APM dropping metrics APM python	5	624	June 24, 2019

Output Event Rates are showing gaps in APM Stack Monitoring page

Related Topics