AWS metrics shipping intervals are unstable and not matching the configuration of 1minute

Yotamloe · October 6, 2020, 4:02pm

Hey everyone. I'm facing an issue when using AWS module (cloudwatch metricset) with metricbeat 7.5.2, I'm trying to send metrics data to elasticsearch every 1 minute and I receive the data from the AWS/NetworkELB service in unstable intervals, and it is causing a mismatch with my cloudwatch data when I perform aggregations.

When I'm checking my cloudwatch account I see that there is a data point every 1m and it seems to be valid. I'm sending also metrics from AWS/EC2 namespace and I receive the data in stable intervals. this issue accurs also when im using metricbeat 7.8,7.9 with cloudwatch or elb metricset.

Does someone have an idea of what can cause this issue? Thanks in advance.
Attaching screenshots and metricbeat configuration below:

metricbeat.modules:
- module: aws
  period: 60s
  metricsets:
    - cloudwatch 
  metrics:
    - namespace: AWS/NetworkELB
    - namespace: AWS/EC2
  access_key_id: ''
  secret_access_key: ''

NetworkELB data (unstable):

EC2 data:

Kaiyan_Sheng · October 6, 2020, 11:34pm

Hello! So EC2 data is shipping every 1 min but not from ELB? Do you see 1min interval from aws cloudwatch portal consistently for both?
Maybe one thing to try here is to separating these two namespaces into two sections of config:

metricbeat.modules:
- module: aws
  period: 60s
  metricsets:
    - cloudwatch 
  metrics:
    - namespace: AWS/NetworkELB
  access_key_id: ''
  secret_access_key: ''
- module: aws
  period: 60s
  metricsets:
    - cloudwatch 
  metrics:
    - namespace: AWS/EC2
  access_key_id: ''
  secret_access_key: ''

It also might be caused by some performance issues inside Metricbeat when there are so many metrics getting queried.

Yotamloe · October 7, 2020, 11:56am

Hey Kaiyan, thank you for the response. I recive data from EC2 every 1m, but unstable intervals from NetwokELB.
I tried to separate metricbeat aws configuration just like you said, but im still facing the same problem.
The data in my cloudwatch portal is consistent every 1m for both services. Have you ever faced similar issues with data shipping intervals from aws?
attaching screen shots below:
Metricbeat.yml:

metricbeat.modules:
- module: aws
  period: 60s
  metricsets:
    - cloudwatch
  metrics:
    - namespace: AWS/EC2
  access_key_id: ''
  secret_access_key: ''

- module: aws
  period: 60s
  metricsets:
    - cloudwatch
  metrics:
    - namespace: AWS/NetworkELB
  access_key_id: ''
  secret_access_key: ''

NetworkELB datapoints in my cloudwatch:

NetworkELB datapoints in my elasticsearch:

Kaiyan_Sheng · October 7, 2020, 10:05pm

Hi @Yotamloe, thank you for all the info!! I haven't seen this issue hmm but definitely worth digging in. Could you narrow down the datapoints in Elasticsearch to be load balancer reset count? So it matches the screenshot from your cloudwatch Thank you!!

Yotamloe · October 7, 2020, 10:46pm

@Kaiyan_Sheng Sure. Let me know if you have some insights or if you need any more information about this case, and thank you for your help.

Kaiyan_Sheng · October 7, 2020, 11:50pm

Thank you so much!! It does seem like this specific metric is not collected per minute. I wonder if this is caused by a bug we fix in 7.10: https://github.com/elastic/beats/pull/21498
In this PR, we fixed the event timestamp to use the actual timestamp from CloudWatch instead. I will try to reproduce this issue on my side!

Kaiyan_Sheng · October 7, 2020, 11:53pm

One thing from looking at the CloudWatch documentation: this metric will only be reported if it has a nonzero value. Could you also check in CloudWatch, instead of SampleCount statistic, use Sum instead, and check for the same time range comparing with Elasticsearch please? Thank you soooo much!!!

Yotamloe · October 8, 2020, 11:26am

No problem. I used avg so you can see the exact data point in elastic search, because elb reports every 1m it is the same as the sum. There are still mismatches (its the same timeframe just differant time zones).

Elsticsearch:

Cloudwatch:

Yotamloe · October 13, 2020, 7:19am

Hey @Kaiyan_Sheng. Have you been able to reproduce this issue?

Kaiyan_Sheng · October 13, 2020, 5:28pm

Hi! With my AWS account Network ELB, there is always a delay on the data points. I think this might cause your problem! For example:

The current timestamp is 17:10 but the last data point is from 17:05. There is a 5min delay on data coming into CloudWatch. This case, we introduced a new config option latency to make sure we can collect data even with a given delay/latency.

- module: aws
  period: 60s
  latency: 5m
  metricsets:
    - cloudwatch
  metrics:
    - namespace: AWS/NetworkELB

But this config option is just added into 7.10 so could you wait till soon it gets released to test it? Or maybe build metricbeat from source code Sorry for the inconvenience!!

system · November 10, 2020, 7:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Metricbeat AWS Cloudwatch & AWS ELB Beats metricbeat	4	377	May 18, 2020
Aws cloudwatch metricset is sampling data -- can it import data exactly? Beats metricbeat	3	552	January 1, 2020
Metricbeat AWS Module Beats metricbeat	12	869	June 1, 2020
AWS Module future support Beats metricbeat	5	359	May 10, 2019
AWS Cloudwatch Metricset not pulling every minute Beats metricbeat	4	464	March 31, 2020

AWS metrics shipping intervals are unstable and not matching the configuration of 1minute

Related topics