Hey everyone. I'm facing an issue when using AWS module (cloudwatch metricset) with metricbeat 7.5.2, I'm trying to send metrics data to elasticsearch every 1 minute and I receive the data from the AWS/NetworkELB service in unstable intervals, and it is causing a mismatch with my cloudwatch data when I perform aggregations.
When I'm checking my cloudwatch account I see that there is a data point every 1m and it seems to be valid. I'm sending also metrics from AWS/EC2 namespace and I receive the data in stable intervals. this issue accurs also when im using metricbeat 7.8,7.9 with cloudwatch or elb metricset.
Does someone have an idea of what can cause this issue? Thanks in advance.
Attaching screenshots and metricbeat configuration below:
Hello! So EC2 data is shipping every 1 min but not from ELB? Do you see 1min interval from aws cloudwatch portal consistently for both?
Maybe one thing to try here is to separating these two namespaces into two sections of config:
Hey Kaiyan, thank you for the response. I recive data from EC2 every 1m, but unstable intervals from NetwokELB.
I tried to separate metricbeat aws configuration just like you said, but im still facing the same problem.
The data in my cloudwatch portal is consistent every 1m for both services. Have you ever faced similar issues with data shipping intervals from aws?
attaching screen shots below:
Metricbeat.yml:
Hi @Yotamloe, thank you for all the info!! I haven't seen this issue hmm but definitely worth digging in. Could you narrow down the datapoints in Elasticsearch to be load balancer reset count? So it matches the screenshot from your cloudwatch Thank you!!
Thank you so much!! It does seem like this specific metric is not collected per minute. I wonder if this is caused by a bug we fix in 7.10: https://github.com/elastic/beats/pull/21498
In this PR, we fixed the event timestamp to use the actual timestamp from CloudWatch instead. I will try to reproduce this issue on my side!
One thing from looking at the CloudWatch documentation: this metric will only be reported if it has a nonzero value. Could you also check in CloudWatch, instead of SampleCount statistic, use Sum instead, and check for the same time range comparing with Elasticsearch please? Thank you soooo much!!!
No problem. I used avg so you can see the exact data point in elastic search, because elb reports every 1m it is the same as the sum. There are still mismatches (its the same timeframe just differant time zones).
The current timestamp is 17:10 but the last data point is from 17:05. There is a 5min delay on data coming into CloudWatch. This case, we introduced a new config option latency to make sure we can collect data even with a given delay/latency.
But this config option is just added into 7.10 so could you wait till soon it gets released to test it? Or maybe build metricbeat from source code Sorry for the inconvenience!!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.