AWS Cloudwatch Metricset not pulling every minute

Facing kind of a weird issue where metricbeat isn't pulling data from AWS every 60 seconds as defined in the config. It generally falls into this 3 minutes on/3 minutes off format. We're going to be setting up alerts on this data so I want to make sure it's coming in at the correct intervals.

Config:

   - module: aws
          period: 60s
          access_key_id: '${AWS_ACCESS_KEY_ID_DATAPRE}'
          secret_access_key: '${AWS_SECRET_ACCESS_KEY_DATAPRE}'      
          metricsets:
            - cloudwatch
          metrics:        
            - namespace: AWS/DynamoDB
              name: ["ConsumedReadCapacityUnits", "ConsumedWriteCapacityUnits"]       
              tags.resource_type_filter: dynamodb:table
              statistic: ["Average", "Sum"]  
              tags:
              - key: "Tenant"
                value: "tenant1"             
          regions:
            - us-east-1 

I ran it in debug and didn't find anything useful. The first event is it publishing 6 metrics from the last pull. It then just ticks away for 3 minutes without any new calls reported.

2020-03-01T06:06:02.972Z    DEBUG    [elasticsearch    elasticsearch/client.go:354    PublishEvents: 6 events have been published to elasticsearch in 16.363506ms.
2020-03-01T06:06:02.972Z    DEBUG    [publisher    memqueue/ackloop.go:160    ackloop: receive ack [1: 0, 6
2020-03-01T06:06:02.972Z    DEBUG    [publisher    memqueue/eventloop.go:535    broker ACK events: count=6, start-seq=7, end-seq=12
2020-03-01T06:06:02.972Z    DEBUG    [publisher    memqueue/ackloop.go:128    ackloop: return ack to broker loop:6
2020-03-01T06:06:02.972Z    DEBUG    [publisher    memqueue/ackloop.go:131    ackloop:  done send ack
2020-03-01T06:06:23.990Z    INFO    [monitoring    log/log.go:145    Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":140,"time":{"ms":5}},"total":{"ticks":720,"time":{"ms":29},"value":720},"user":{"ticks":580,"time":{"ms":24}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":275367}},"memstats":{"gc_next":27112464,"memory_alloc":18472240,"memory_total":96046320},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":6,"batches":1,"total":6},"read":{"bytes":1516},"write":{"bytes":6594}},"pipeline":{"clients":1,"events":{"active":0,"published":6,"total":6},"queue":{"acked":6}}},"metricbeat":{"aws":{"cloudwatch":{"events":6,"success":6}}},"system":{"load":{"1":2.62,"15":1.54,"5":2.19,"norm":{"1":0.655,"15":0.385,"5":0.5475}}}}}}
2020-03-01T06:06:53.990Z    INFO    [monitoring    log/log.go:145    Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":140,"time":{"ms":1}},"total":{"ticks":730,"time":{"ms":5},"value":730},"user":{"ticks":590,"time":{"ms":4}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":305368}},"memstats":{"gc_next":27112464,"memory_alloc":18871544,"memory_total":96445624},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":2.51,"15":1.56,"5":2.19,"norm":{"1":0.6275,"15":0.39,"5":0.5475}}}}}}
2020-03-01T06:07:23.990Z    INFO    [monitoring    log/log.go:145    Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":190,"time":{"ms":55}},"total":{"ticks":970,"time":{"ms":254},"value":970},"user":{"ticks":780,"time":{"ms":199}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":335368}},"memstats":{"gc_next":16248080,"memory_alloc":8234520,"memory_total":99273368},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":2.27,"15":1.57,"5":2.16,"norm":{"1":0.5675,"15":0.3925,"5":0.54}}}}}}
2020-03-01T06:07:53.990Z    INFO    [monitoring    log/log.go:145    Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":190,"time":{"ms":1}},"total":{"ticks":980,"time":{"ms":5},"value":980},"user":{"ticks":790,"time":{"ms":4}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":365367}},"memstats":{"gc_next":16248080,"memory_alloc":8458736,"memory_total":99497584},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":1.54,"15":1.53,"5":2,"norm":{"1":0.385,"15":0.3825,"5":0.5}}}}}}
2020-03-01T06:08:23.990Z    INFO    [monitoring    log/log.go:145    Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":190,"time":{"ms":1}},"total":{"ticks":1000,"time":{"ms":27},"value":1000},"user":{"ticks":810,"time":{"ms":26}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":395368}},"memstats":{"gc_next":16248080,"memory_alloc":11344064,"memory_total":102382912},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":1.14,"15":1.5,"5":1.86,"norm":{"1":0.285,"15":0.375,"5":0.465}}}}}}
2020-03-01T06:08:53.990Z    INFO    [monitoring    log/log.go:145    Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":200,"time":{"ms":4}},"total":{"ticks":1010,"time":{"ms":4},"value":1010},"user":{"ticks":810}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":425367}},"memstats":{"gc_next":16248080,"memory_alloc":11565160,"memory_total":102604008},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":1,"15":1.47,"5":1.74,"norm":{"1":0.25,"15":0.3675,"5":0.435}}}}}}
2020-03-01T06:09:23.990Z    INFO    [monitoring    log/log.go:145    Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":200,"time":{"ms":4}},"total":{"ticks":1050,"time":{"ms":35},"value":1050},"user":{"ticks":850,"time":{"ms":31}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":455368}},"memstats":{"gc_next":15973840,"memory_alloc":8187280,"memory_total":105582648},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":1.62,"15":1.5,"5":1.8,"norm":{"1":0.405,"15":0.375,"5":0.45}}}}}}

edit:

as a note, It does not look like this is happening to all data types. There's a custom metric I'm pulling every 60 seconds that seems to be working fine. I'll test a few others tomorrow and test if it's just dynamo having this issue.

Hi @AddChickpeas, the config for period: 60s should work in this case. Could you check in CloudWatch portal to see if there are data points for this DynamoDB every minute? I know some services only report some metrics to AWS CloudWatch when there is a value. Not sure if that's the case here. I will also try to reproduce it tomorrow on my AWS env. Thanks!

Hi @Kaiyan_Sheng, thanks for responding. I'm thinking it may be something similar as well. It seems to inconsistently return null values; some are pulled, while others are not. From what I can tell, it does usually look like there is a 0 data point in Cloudwatch.

I'll need to test more when the application has more usage.

Thanks for checking! If DynamoDB is sending metrics every 1 minute to CW, then this Metricbeat module should be able to collect them. If there are null/0 values in CW, then Metricbeat will skip it as well.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.