Filebeat and metricbeat produce different docker metadata for same container

Hi,

Metricbeat and Filebeat seem to have different ways of processing docker metadata, specifically docker labels.
This means that for the same container you end up with different document schema in ES.

My first reflex is to think that both beat should create as similar documents as they can. They don't appear to currently share the code they use to process docker labels.

Wanted to see if I missed something or if this should be raised as a bug, either on filebeat or on metricbeat.
Notice below how filebeat has "docker.container.image" but metricbeat does not.
Notice how different they process labels such as: "docker.container.labels.com.amazonaws.ecs.cluster"
(Filebeat is de-dotting labels with dots into an object where metricbeat replaced dots with underscore.)

the docker object from filebeat 6.0.0-rc2: (add_docker_metadata processor)

"docker": {
      "container": {
        "labels": {
          "com": {
            "amazonaws": {
              "ecs": {
                "task-definition-version": "1",
                "cluster": "test-main-default-cluster",
                "task-definition-family": "mytask-TaskDef-1A9PDSU36ZZCD",
                "container-name": "ecs-metricbeat",
                "task-arn": "arn:aws:ecs:us-east-1:111111111111:task/81ca4ee2-76ff-43ea-f17b-1c909d527c75"
              }
            }
          },
          "name": "CentOS Base Image",
          "license": "GPLv2",
          "vendor": "CentOS",
          "build-date": "20170705"
        },
        "id": "86d02c540afa0002f95fab50468e3e61ec3d631f4bb81f3774e3d36fbe9b5748",
        "image": "111111111111.dkr.ecr.us-east-1.amazonaws.com/my-services/ecs-metricbeat:dbef8990da31a46b2e9e21b85e5304415b197a72",
        "name": "mytask-TaskDef-1A9PDSU36ZZCD-1-ecs-metricbeat-808187ed8db4eca8dc02"
      }
    },

The docker object from metricbeat 5.5.0: (docker module)

"docker": {
      "container": {
        "id": "86d02c540afa0002f95fab50468e3e61ec3d631f4bb81f3774e3d36fbe9b5748",
        "labels": {
          "build-date": "20170705",
          "com_amazonaws_ecs_cluster": "test-main-default-cluster",
          "com_amazonaws_ecs_container-name": "ecs-metricbeat",
          "com_amazonaws_ecs_task-arn": "arn:aws:ecs:us-east-1:111111111111:task/81ca4ee2-76ff-43ea-f17b-1c909d527c75",
          "com_amazonaws_ecs_task-definition-family": "mytask-TaskDef-1A9PDSU36ZZCD",
          "com_amazonaws_ecs_task-definition-version": "1",
          "license": "GPLv2",
          "name": "CentOS Base Image",
          "vendor": "CentOS"
        },
        "name": "mytask-TaskDef-1A9PDSU36ZZCD-1-ecs-metricbeat-808187ed8db4eca8dc02"
      },

@martinr_ubi Thanks for bringing this up. I agree the format should be identical if possible. For example we should format labels the same. The reason for this difference is that the data comes from two different implementations: Metricbeat module getting stats, processor enriching data.

Could you open a Github issue so we get the two format in sync? @exekias FYI

One not for the "missing" data. The data exposed by the processor should only be a subset of the data collected by the metricbeat module but enough data to make it useful. So we need to find a balance here.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.