[elastic-agent][beats] Problem when running as a non-root randomized UID

The OpenShift environment has a security policy that requires containers to run with a randomized UID. This causes issues when running the Elastic Agent. Typically, a standard solution is to ensure the user belongs to group 0 (root), as the root group does not have any special permissions in this context but allows for consistent file access across random UIDs.

The elastic-agent-complete image (and likely the standard elastic-agent image) uses the following IDs:

podman run --rm -it --entrypoint /usr/bin/id elastic/elastic-agent-complete:9.3.1
uid=1000(elastic-agent) gid=1000(elastic-agent) groups=1000(elastic-agent),0(root)

Starting the container results in numerous errors such as:

{
  "log.level": "error",
  "@timestamp": "2026-03-05T10:02:07.365Z",
  "message": "Failed to list light metricsets for module tomcat: getting metricsets for module 'tomcat': loading light module 'tomcat' definition: loading module configuration from '/usr/share/elastic-agent/data/elastic-agent-2ec825/components/module/tomcat/module.yml': config file (\"/usr/share/elastic-agent/data/elastic-agent-2ec825/components/module/tomcat/module.yml\") must be owned by the user identifier (uid=1000730000) or root",
  "component": {
    "binary": "metricbeat",
    "dataset": "elastic_agent.metricbeat",
    "id": "beat/metrics-monitoring",
    "type": "beat/metrics"
  },
  "service.name": "metricbeat",
  "log.logger": "registry.lightmodules",
  "log.origin": {
    "file.line": 145,
    "file.name": "mb/lightmodules.go",
    "function": "github.com/elastic/beats/v7/metricbeat/mb.(*LightModulesSource).ModulesInfo"
  },
  "resource": {
    "service.instance.id": "fe0de429-1ead-41a0-8785-e5a3890390db",
    "service.name": "/usr/share/elastic-agent/data/elastic-agent-2ec825/components/elastic-otel-collector",
    "service.version": "9.3.1"
  },
  "otelcol.component.id": "metricbeatreceiver/_agent-component/beat/metrics-monitoring",
  "otelcol.signal": "logs",
  "log": {
    "source": "beat/metrics-monitoring"
  },
  "ecs.version": "1.6.0",
  "otelcol.component.kind": "receiver"
}

Initially, I thought the method was checking whether the files were writable by
the process - and even if error message is quite clear, then it should not fail
when the process can write to the file. I created a minimal Dockerfile to test
this:

FROM elastic/elastic-agent-complete:9.3.1

USER root
RUN usermod -g 0 elastic-agent && \
    find / -not -path "/proc/*" -user elastic-agent -exec chmod g+u {} \;  && \
    find / -not -path "/proc/*" -group elastic-agent -exec chown elastic-agent:root {} \; && \
    groupdel elastic-agent
USER elastic-agent

The root group should have all the permissions that the owner normally has, but the error persists. I examined the Beats source code at: beats/libbeat/common/config.go at main · elastic/beats · GitHub

func OwnerHasExclusiveWritePerms(name string) error {

This function checks if the running user (EUID - effective user ID) has exclusive write permissions to the file. The problem is that with a randomized UID, this check will always fail, even though the check technically allows the file to be owned by root.

To maintain write permissions, the group (0 - root, as seen in images prepared by Red Hat) should have 'rw' or 'rwx' permissions.

podman run --rm --entrypoint /usr/bin/id registry.access.redhat.com/ubi10/httpd-24
uid=1001(default) gid=0(root) groups=0(root)

Another potential workaround would be to make all files owned by root, but looking at the code, the second part of the check is:


	// Test if group or other have write permissions.
	if perm&0022 > 0 {
		nameAbs, err := filepath.Abs(name)
		if err != nil {
			nameAbs = name
		}
		return fmt.Errorf(`config file ("%v") can only be writable by the `+
			`owner but the permissions are "%v" (to fix the permissions use: `+
			`'chmod go-w %v')`,
			name, perm, nameAbs)
	}

This logic causes the current solution to fail consistently in hardened environments where group-write permissions are necessary for functionality.

IMO the current check should be changed to:

  • Allowing the group access when euid missmatch