Understanding add_process_metadata and add_docker_metadata

Hey everyone,

I am having trouble understanding how these two processes work with a containerised application.

my goal here is to add container.id and other docker-related labels to my logs in order to:

  1. Have some correlation between a service page in Observability and the logs where you click on the service instance and select container logs
    image

  2. be able to build dashboards and easily use docker labels/information to filter my logs

Now onto my setup. our .NET APIs use serilog as their logging framework and we have the ECS formatted and the APM package with the serilog enriched, they are all containerised applications and we use the console sink to send standard .NET log format to the stout for easy viewing via the docker logs command and we also use the file sink to produce the ECS JSON formatted logs to a file which are mounted to the host machine under /data/logs/<service_name>/*.json which is in turn collected by a Filebeat instance installed to each host directly and uses the file input.

obviously, as we are not using the docker input to collect the logs directly from the containers stdout we lose out on all the docker metadata attached to the logs and here is where I started to look at using the add_docker_metadata process on Filebeat but I have encountered the following issues.

from what I understand (please correct me if I am wrong) the process can use a few different fields to match the services with the container it is running in as follows:

  1. Match_fields, this looks for a group ID which currently doesn't exist inside our logs as the ECS formatter doesn't add it and the APM package is only seen to add the container.id field to metrics it sends out and not the log enriched
  2. match_pids, this matches a pid of the application with container pid to create a correlation and enrich the logs but the ECS formatter doesn't populate the process.parent.pid field and the process.pid field is of the process within the container which doesn't correlate to the pid on the host which is where Filebeat tries to match it

after looking at this I noticed that with the information currently provided via the ECS formatter and APM packages, this processor would always fail to match any of the fields that simply didn't exist or were invalid for matching. this is where I can across the add_process_metadata process which I saw adds the container.id field which we can use to match with the add_docker_metadata process but unfortunately I ran into the same issues as the second point above the provided pid for the ECS formatter is that of inside the and as Filebeat is running on the host it tries to match it to a process running on the host instead.

am I missing something obviously in this process or is the answer to having our application self-report the container ID for us to match?

Any help would be really appreciated and please ask for any clarifications. Thank you.