Facing kind of a weird issue where metricbeat isn't pulling data from AWS every 60 seconds as defined in the config. It generally falls into this 3 minutes on/3 minutes off format. We're going to be setting up alerts on this data so I want to make sure it's coming in at the correct intervals.
Config:
- module: aws
period: 60s
access_key_id: '${AWS_ACCESS_KEY_ID_DATAPRE}'
secret_access_key: '${AWS_SECRET_ACCESS_KEY_DATAPRE}'
metricsets:
- cloudwatch
metrics:
- namespace: AWS/DynamoDB
name: ["ConsumedReadCapacityUnits", "ConsumedWriteCapacityUnits"]
tags.resource_type_filter: dynamodb:table
statistic: ["Average", "Sum"]
tags:
- key: "Tenant"
value: "tenant1"
regions:
- us-east-1
I ran it in debug and didn't find anything useful. The first event is it publishing 6 metrics from the last pull. It then just ticks away for 3 minutes without any new calls reported.
2020-03-01T06:06:02.972Z DEBUG [elasticsearch elasticsearch/client.go:354 PublishEvents: 6 events have been published to elasticsearch in 16.363506ms.
2020-03-01T06:06:02.972Z DEBUG [publisher memqueue/ackloop.go:160 ackloop: receive ack [1: 0, 6
2020-03-01T06:06:02.972Z DEBUG [publisher memqueue/eventloop.go:535 broker ACK events: count=6, start-seq=7, end-seq=12
2020-03-01T06:06:02.972Z DEBUG [publisher memqueue/ackloop.go:128 ackloop: return ack to broker loop:6
2020-03-01T06:06:02.972Z DEBUG [publisher memqueue/ackloop.go:131 ackloop: done send ack
2020-03-01T06:06:23.990Z INFO [monitoring log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":140,"time":{"ms":5}},"total":{"ticks":720,"time":{"ms":29},"value":720},"user":{"ticks":580,"time":{"ms":24}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":275367}},"memstats":{"gc_next":27112464,"memory_alloc":18472240,"memory_total":96046320},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":6,"batches":1,"total":6},"read":{"bytes":1516},"write":{"bytes":6594}},"pipeline":{"clients":1,"events":{"active":0,"published":6,"total":6},"queue":{"acked":6}}},"metricbeat":{"aws":{"cloudwatch":{"events":6,"success":6}}},"system":{"load":{"1":2.62,"15":1.54,"5":2.19,"norm":{"1":0.655,"15":0.385,"5":0.5475}}}}}}
2020-03-01T06:06:53.990Z INFO [monitoring log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":140,"time":{"ms":1}},"total":{"ticks":730,"time":{"ms":5},"value":730},"user":{"ticks":590,"time":{"ms":4}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":305368}},"memstats":{"gc_next":27112464,"memory_alloc":18871544,"memory_total":96445624},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":2.51,"15":1.56,"5":2.19,"norm":{"1":0.6275,"15":0.39,"5":0.5475}}}}}}
2020-03-01T06:07:23.990Z INFO [monitoring log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":190,"time":{"ms":55}},"total":{"ticks":970,"time":{"ms":254},"value":970},"user":{"ticks":780,"time":{"ms":199}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":335368}},"memstats":{"gc_next":16248080,"memory_alloc":8234520,"memory_total":99273368},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":2.27,"15":1.57,"5":2.16,"norm":{"1":0.5675,"15":0.3925,"5":0.54}}}}}}
2020-03-01T06:07:53.990Z INFO [monitoring log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":190,"time":{"ms":1}},"total":{"ticks":980,"time":{"ms":5},"value":980},"user":{"ticks":790,"time":{"ms":4}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":365367}},"memstats":{"gc_next":16248080,"memory_alloc":8458736,"memory_total":99497584},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":1.54,"15":1.53,"5":2,"norm":{"1":0.385,"15":0.3825,"5":0.5}}}}}}
2020-03-01T06:08:23.990Z INFO [monitoring log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":190,"time":{"ms":1}},"total":{"ticks":1000,"time":{"ms":27},"value":1000},"user":{"ticks":810,"time":{"ms":26}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":395368}},"memstats":{"gc_next":16248080,"memory_alloc":11344064,"memory_total":102382912},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":1.14,"15":1.5,"5":1.86,"norm":{"1":0.285,"15":0.375,"5":0.465}}}}}}
2020-03-01T06:08:53.990Z INFO [monitoring log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":200,"time":{"ms":4}},"total":{"ticks":1010,"time":{"ms":4},"value":1010},"user":{"ticks":810}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":425367}},"memstats":{"gc_next":16248080,"memory_alloc":11565160,"memory_total":102604008},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":1,"15":1.47,"5":1.74,"norm":{"1":0.25,"15":0.3675,"5":0.435}}}}}}
2020-03-01T06:09:23.990Z INFO [monitoring log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":200,"time":{"ms":4}},"total":{"ticks":1050,"time":{"ms":35},"value":1050},"user":{"ticks":850,"time":{"ms":31}}},"handles":{"limit":{"hard":65536,"soft":65536},"open":8},"info":{"ephemeral_id":"ff76b39e-8bcc-4955-8a3e-46470bd45b6e","uptime":{"ms":455368}},"memstats":{"gc_next":15973840,"memory_alloc":8187280,"memory_total":105582648},"runtime":{"goroutines":24}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"load":{"1":1.62,"15":1.5,"5":1.8,"norm":{"1":0.405,"15":0.375,"5":0.45}}}}}}
edit:
as a note, It does not look like this is happening to all data types. There's a custom metric I'm pulling every 60 seconds that seems to be working fine. I'll test a few others tomorrow and test if it's just dynamo having this issue.