Metricbeat AWS Module : dashboard not working and aws.sqs.messages.visible multiplied by 5

Hello,

Using the example configuration for the AWS Metricset SQS from : https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-aws-sqs.html

AWS data is retrieved and indexed, after that I have 2 issue I can't fix.

  • The AWS SQS dashboard shows my region and queue's in filters , and all the visualization show "No data to display for the selected metrics"
  • The aws.sqs.messages.visible value is consistently multiplied by 5 in the index.
    It can take up to 2 times the 300s interval , but is always goes to 5 times the value shown in AWS.

I started this setup on the 7.5.2 stack and couldn't find the issue.
On a clean installed docker 7.6.0 stack I get exactly the same behaviour.

This is the only configuration I have done after the clean install:

  • module: aws
    period: 300s
    metricsets:
    • sqs
      access_key_id: 'XXXXXXXXXXXX'
      secret_access_key: 'XXXXXXXXXXXX'
      regions:
    • eu-west-1

Any idea what I'm missing here?

Configuring both the SQS metricset and the Cloudwatch metricset with metrics AWS/SQS gives some more insight.

Looking at ApproximateNumberOfMessagesVisible in the Cloudwatch result there is a count of 5 and a SUM of 5 times the number of visible messages.
Only this SUM is shown in the result for SQS metricset result.
The same for ApproximateAgeOfOldestMessage , SQS metricset uses the SUM from Cloudwatch result.

From SQS metricset :

{
"_index" : "metricbeat-7.6.0-2020.02.19-000001",
"_type" : "_doc",
"_id" : "COITXHABWMJAjxUqPU2-",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2020-02-19T06:12:07.825Z",
"agent" : {
"hostname" : "1e702c874f42",
"id" : "88c860bc-d744-4f78-b468-c6de280d4e2d",
"version" : "7.6.0",
"type" : "metricbeat",
"ephemeral_id" : "180ee828-1143-4d60-a22b-061c1d83c874"
},
"metricset" : {
"name" : "sqs",
"period" : 300000
},
"service" : {
"type" : "aws"
},
"aws" : {
"sqs" : {
"oldest_message_age" : {
"sec" : 414603
},
"messages" : {
"received" : 0,
"sent" : 0,
"delayed" : 0,
"not_visible" : 0,
"visible" : 130,
"deleted" : 0
},
"empty_receives" : 0,
"sent_message_size" : { },
"queue" : {
"name" : "Import-dev-DeadLetterQueue-11S5VYXEN1V8"
}
}
},
"cloud" : {
"account" : {
"name" : "xxxxxxxxxxxxxxx",
"id" : "111111111111111"
},
"provider" : "aws",
"region" : "eu-west-1"
},
"event" : {
"dataset" : "aws.sqs",
"module" : "aws",
"duration" : 902316100
},
"ecs" : {
"version" : "1.4.0"
},
"host" : {
"name" : "1e702c874f42"
}
}
},

From Cloudwatch Metricset :

{
"_index" : "metricbeat-7.6.0-2020.02.19-000001",
"_type" : "_doc",
"_id" : "C-ITXHABWMJAjxUqQk1B",
"_score" : 0.13353139,
"_source" : {
"@timestamp" : "2020-02-19T06:12:08.750Z",
"service" : {
"type" : "aws"
},
"cloud" : {
"provider" : "aws",
"region" : "eu-west-1",
"account" : {
"name" : "xxxxxxxxxxxxxxx",
"id" : "111111111111111"
}
},
"aws" : {
"cloudwatch" : {
"namespace" : "AWS/SQS"
},
"dimensions" : {
"QueueName" : "Import-dev-DeadLetterQueue-11S5VYXEN1V8"
},
"sqs" : {
"metrics" : {
"NumberOfEmptyReceives" : {
"avg" : 0,
"max" : 0,
"min" : 0,
"sum" : 0,
"count" : 5
},
"ApproximateNumberOfMessagesDelayed" : {
"max" : 0,
"min" : 0,
"sum" : 0,
"count" : 5,
"avg" : 0
},
"NumberOfMessagesSent" : {
"count" : 5,
"avg" : 0,
"max" : 0,
"min" : 0,
"sum" : 0
},
"ApproximateAgeOfOldestMessage" : {
"max" : 83040,
"min" : 82800,
"sum" : 414603,
"count" : 5,
"avg" : 82920.6
},
"ApproximateNumberOfMessagesVisible" : {
"count" : 5,
"avg" : 26,
"max" : 26,
"min" : 26,
"sum" : 130
},
"NumberOfMessagesReceived" : {
"count" : 5,
"avg" : 0,
"max" : 0,
"min" : 0,
"sum" : 0
},
"NumberOfMessagesDeleted" : {
"min" : 0,
"sum" : 0,
"count" : 5,
"avg" : 0,
"max" : 0
},
"ApproximateNumberOfMessagesNotVisible" : {
"max" : 0,
"min" : 0,
"sum" : 0,
"count" : 5,
"avg" : 0
}
}
}
},
"event" : {
"duration" : 1192613900,
"dataset" : "aws.cloudwatch",
"module" : "aws"
},
"metricset" : {
"name" : "cloudwatch",
"period" : 300000
},
"host" : {
"name" : "1e702c874f42"
},
"agent" : {
"type" : "metricbeat",
"ephemeral_id" : "180ee828-1143-4d60-a22b-061c1d83c874",
"hostname" : "1e702c874f42",
"id" : "88c860bc-d744-4f78-b468-c6de280d4e2d",
"version" : "7.6.0"
},
"ecs" : {
"version" : "1.4.0"
}
}
},

Reducing the 300s period from the documentation to 60s fixes the Count of 5 and because of this the SUM now is equal to the MAX value.

The dashboard is still showing : "No data to display for the selected metrics"

Configuring EC2 metricset and Cloudwatch metricset with metrics AWS/EC2 with default configuration and comparing the data shows that:

EC2 metricset uses the avg value , not the sum like the SQS metricset does.

After some modification the SQS dashboard is now working.

  • set the period for sqs to 60s instead of 300s, this logs "sqs/sqs.go:55 period needs to be set to 300s (or a multiple of 300s)" on startup but it works.
  • edit the SQS visualizations
    set Group by to "aws.sqs.queue.name" , this is empty by default
    set Panel Options Interval to 1m , this is 5m by default

Could this be an issue in the AWS module?
When SQS Metricset would use the AVG value and Group by "aws.sqs.queue.name" is set in the default visualization it would all work OOTB.

Hi @glaenen, thanks for posting it here. You are absolutely right on the statistic method used for sqs metricset. I just created a PR to change it from Sum to Average: https://github.com/elastic/beats/pull/16438

For dashboard, if you wait for more than 5min, when the second collection period finishes, the data should start to show up. Because right now, the dashboard by default dropped the last data point. I'm also trying to change this in the same PR.

Thanks again for thoroughly reporting this issue.

Hello @Kaiyan_Sheng , thanks for looking into this.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.