Metricbeat 7.8 AWS/RDS | Events missing db_instance.identifier, cluster_identifier, etc

The events I'm receiving from AWS/RDS do not match the shape I expect after reviewing the docs, so I'm requesting some clarification about the expected behavior. In particular, is it expected that some events will be generated without any association to a db instance, cluster, or engine name? Am I misunderstanding something about scraping rds metrics through cloudwatch here? I'm seeing a number of such 'fragmented' events, which makes it difficult to aggregate these metrics in a useful way for e.g. graphing and alerting. I don't see any errors in the metricbeat logs, including at debug level. The examples below are illustrative:

No identifiers - no way to tell which instance/cluster/engine this is for:

    {
      "_index": "metricbeat-2020.07.10-000017",
      "_type": "_doc",
      "_id": "jVnUOnMB4azz_707Q0oN",
      "_version": 1,
      "_score": null,
      "_source": {
        "@timestamp": "2020-07-10T22:24:11.004Z",
        "cloud": {
          "provider": "aws",
          "region": "us-east-1",
    ...
          }
        },
        "ecs": {
          "version": "1.5.0"
        },
        "host": {
    ...
        },
        "agent": {
    ...
        },
        "event": {
          "dataset": "aws.rds",
          "module": "aws",
          "duration": 4126072412
        },
        "metricset": {
          "name": "rds",
          "period": 60000
        },
        "service": {
          "type": "aws"
        },
        "aws": {
          "rds": {
            "throughput": {
              "insert": 0.49994167347142837,
              "network_receive": 97687.58228755229,
              "ddl": 0,
              "select": 88.28969953505424,
              "update": 0,
              "network": 156748.4459110378,
              "network_transmit": 59060.86362348549,
              "dml": 0.49994167347142837,
              "commit": 0.49994167347142837,
              "delete": 0
            },
            "login_failures": 0,
            "db_instance.class": "db.r5.xlarge",
            "deadlocks": 0,
            "freeable_memory.bytes": 5594017792,
            "aurora_bin_log_replica_lag": 0,
            "disk_usage": {
              "bin_log.bytes": 0
            },
            "cache_hit_ratio.result_set": 30.53247734138973,
            "free_local_storage.bytes": 79721652224,
            "engine_uptime.sec": 9770291,
            "database_connections": 98,
            "cpu": {
              "total": {
                "pct": 0.04
              }
            },
            "latency": {
              "commit": 2.5976,
              "ddl": 0,
              "dml": 0.14663333333333334,
              "update": 0,
              "delete": 0,
              "insert": 0.14663333333333334,
              "select": 0.14652510381275952
            },
            "cache_hit_ratio.buffer": 100,
            "transactions": {
              "active": 0,
              "blocked": 0
            },
            "queries": 107.86992303335221
          }
        }
      },
      "fields": {
        "@timestamp": [
          "2020-07-10T22:24:11.004Z"
        ]
      },
      ...
    }

Partial identifiers - DBInstanceIdentifier but no cluster identifier or engine name (it's part of an aurora-mysql cluster):

    {
      "_index": "metricbeat-2020.07.10-000017",
      "_type": "_doc",
      "_id": "-lnUOnMB4azz_707Q0op",
      "_version": 1,
      "_score": null,
      "_source": {
        "@timestamp": "2020-07-10T22:24:11.004Z",
        "cloud": {
          "availability_zone": "us-east-1b",
          "provider": "aws",
          "region": "us-east-1",
          ...
          }
        },
        "metricset": {
          "name": "rds",
          "period": 60000
        },
        "event": {
          "dataset": "aws.rds",
          "module": "aws",
          "duration": 4127377448
        },
        "service": {
          "type": "aws"
        },
        "ecs": {
          "version": "1.5.0"
        },
        "host": {
          ...
        },
        "agent": {
          ...
          "type": "metricbeat",
          "version": "7.8.0",
          ...
        },
        "aws": {
          "rds": {
            "throughput": {
              "ddl": 0,
              "dml": 0.5,
              "select": 38.1,
              "update": 0,
              "commit": 0.5,
              "delete": 0,
              "network_receive": 31193.1596579829,
              "network": 66075.15375768789,
              "insert": 0.5,
              "network_transmit": 34881.994099704985
            },
            "latency": {
              "update": 0,
              "delete": 0,
              "insert": 0.1658,
              "select": 0.168831583552056,
              "commit": 2.9006333333333334,
              "ddl": 0,
              "dml": 0.1658
            },
            "aurora_bin_log_replica_lag": 0,
            "db_instance": {
              "arn": ...
              "class": "db.r5.large",
              "identifier": "v1bd02mya01",
              "status": "available"
            },
            "free_local_storage.bytes": 29452050432,
            "database_connections": 48,
            "login_failures": 0,
            "queries": 51.214105961368595,
            "deadlocks": 0,
            "db_instance.identifier": "v1bd02mya01",
            "freeable_memory.bytes": 4686135296,
            "disk_usage": {
              "bin_log.bytes": 0
            },
            "cpu": {
              "total": {
                "pct": 0.07
              }
            },
            "cache_hit_ratio.buffer": 100,
            "cache_hit_ratio.result_set": 12.045554095488392,
            "engine_uptime.sec": 13412709,
            "transactions": {
              "active": 0,
              "blocked": 0
            }
          }
        }
      },
      "fields": {
        "@timestamp": [
          "2020-07-10T22:24:11.004Z"
        ]
      },
      ...
    }

How could I go about aggregating say queries-ps, open connections, or other db-level metrics, grouped by instance, and filtered by say a particular engine or cluster with this data? There doesn't appear to be any association between the measurements at these different dimensions. I would be happy to provide any additional information you might need. Any input you might have would be most appreciated, thank you!

Hi! Thanks for posting your question here! I think the reason why there are events missing identifier for AWS/RDS is because in AWS CloudWatch, there are metrics for Across All Databases.

Thank you for the quick response!

That's what I was guessing, but am grateful for confirmation. I've added some tags to my instances, which is allowing me to query data across the different cloudwatch dimensions.

If anyone has any advice or best practices regarding monitoring and alerting on RDS data collected via metricbeat and stored in ES, I'd be most appreciative, but I consider this question resolved. Thanks again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.