Filebeats to correctly propagate Docker container information into Elasticsearc

Hi all,

I'm struggling to get Filebeats to correctly propagate Docker container information into Elasticsearch.

It includes 'docker.container.id'. However, the id is postfixed with '-json.log' as shown in the example below. Name, image, and labels are not included. This causes issues when making use of the "infrastructure" page within Kibana "get logs". The search criteria includes the container id (excluding the '-json.log') and therefore doesn't include the corresponding logs (as seen in the image. I've tried various different methods within the config, has someone else had the same problem?

docker.container.id ec593a25551aec08dd77ddadc8b2db98113a6201dcab986a593fb686a7e882f3-json.log

Config

#=========================== Filebeat inputs =============================

filebeat.inputs:

- type: docker
  combine_partial: true
  containers:
    path: "/apps/docker/containers"
    stream: "all"
    ids:
      - "*"

#============================= Filebeat modules ===============================

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

  reload.enabled: false

#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 1

#================================ General =====================================

fields:
  env: dev

#============================== Dashboards =====================================


#============================== Kibana =====================================


#================================ Outputs =====================================

#================================ Processors =====================================

processors:
- add_docker_metadata:
    host: "unix:///var/run/docker.sock"

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  hosts: ["xx.xx.xx.xx:9200"]

07

Hi @tpnbrown and welcome to the forum.

The default setting for containers.path in a docker input definition is "/var/lib/docker/containers". (If you had not customized it.)
ref.

The defaults for the add_docker_metadata is that match_source is enabled and match_source_index is 4 to match "/var/lib/docker/containers/<container_id>/*.log"
var=0 lib=1 docker=2 containers=3 <container_id>=4 filename=5
ref.

So your config is instructing the add_docker_metadata to extract the container id from index 4 of: "/apps/docker/containers/ec593a25551aec08dd77ddadc8b2db98113a6201dcab986a593fb686a7e882f3/ec593a25551aec08dd77ddadc8b2db98113a6201dcab986a593fb686a7e882f3-json.log"

Which is : "ec593a25551aec08dd77ddadc8b2db98113a6201dcab986a593fb686a7e882f3-json.log".

This mean filebeat cannot query the docker daemon with a valid container id to obtain further metadata about the container id in question. Because the -json.log suffix is obviously making the thing invalid as a container id.

For your containers.path setting of "/apps/docker/containers" in your docker input you would need to specify a match_source_index of 3 in the add_docker_metadata processor config so you extract the part of the path where there is valid container id.

The processor would then be able to query the docker daemon with the id and the results would be added to your events along with a now correct container id, without the suffix you see now.

Make sure to check the logs if you still have issues and report these here. It is possible for filebeat to have problems querying the docker daemon at "unix:///var/run/docker.sock". For example if filebeat is dockerized itself and/or it has permission issue.
ref.

So in short try to adjust like this explicitely:

- type: docker
  combine_partial: true
  containers:
    # log file source path will be e.g.: "/apps/docker/containers/ec593a25551aec08dd77ddadc8b2db98113a6201dcab986a593fb686a7e882f3/ec593a25551aec08dd77ddadc8b2db98113a6201dcab986a593fb686a7e882f3-json.log"
    path: "/apps/docker/containers"
    stream: "all"
    ids:
      - "*"

processors:
- add_docker_metadata:
    host: "unix:///var/run/docker.sock"
    match_source: true
    # Extract container id from source path splitted on /.
    # e.g.: "/apps/docker/containers/ec593a25551aec08dd77ddadc8b2db98113a6201dcab986a593fb686a7e882f3/ec593a25551aec08dd77ddadc8b2db98113a6201dcab986a593fb686a7e882f3-json.log"
    match_source_index: 3

Also, thanks for quoting the config snippet for readability and remember that your should always include as much information as possible to help elicit helpful answers.
You didn't mention the versions of the different components you're using and did not mention if you checked the logs. You could have included the logs too. Because your docker container logs are not in the default location you could have provided a full path example to a log file. A correctly quoted example of a full event missing docker metadata and showing the container id with the suffix would also have been clearer.
I say that amicably as you quoted your config snippet correctly and that is critical.

I also wanted to add that it's sometime better to declare the processor in the input definition itself so that it is scoped to the input and not global to all events processed by filebeat. It can also simply enhance the readability of the config. The impact of that of course depends on how many different inputs your filebeat config has and if the processor would be extraneous and useless, potentially even wasteful.
That is highly subjective and opinion based, so I'm just sharing because your config also include loading modules from modules.d/. Their events would also be processed by your globally defined add_docker_metadata processor but they likely would not be docker logs.
ref.
ref. 2

I hope that helps and that you let us know if you still have issues,

Martin

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.