Elastic Synthetics Journey: Receiving `permission denied`

I am deploying a journey to Elastic Synthetics using the following command: npm run push.

I'm using Elastic Cloud 8.7.0 and my agent is an elastic-agent-complete:8.7.0 container.

The test is pushing to Elastic Cloud as expected, however it does not run and stays in pending status. When I run the test locally using npx @elastic/synthetics ., it runs as expected and succeeds.

When I check the agent logs for the test ID, these are the log entries I find that appear to be relevant:


{"log.level":"info","@timestamp":"2023-04-14T15:09:48.602Z","message":"Running /usr/share/elastic-agent/.node/node/bin/npm install in /tmp/elastic-synthetics-unzip-3786772281","component":{"binary":"heartbeat","dataset":"elastic_agent.heartbeat","id":"synthetics/browser-default","type":"synthetics/browser"},"log":{"source":"synthetics/browser-default"},"log.origin":{"file.line":148,"file.name":"source/local.go"},"service.name":"heartbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

{"log.level":"info","@timestamp":"2023-04-14T15:09:51.429Z","message":"Running command: /tmp/elastic-synthetics-unzip-3786772281/node_modules/.bin/elastic-synthetics /tmp/elastic-synthetics-unzip-3786772281/node_modules/.bin/elastic-synthetics /tmp/elastic-synthetics-unzip-3786772281 --playwright-options {\"headless\":true,\"ignoreHTTPSErrors\":false} --screenshots on --throttling 5d/3u/20l --rich-events --match MyApp Dev --params \"{2 hidden params}\" in directory: '/tmp/elastic-synthetics-unzip-3786772281'","component":{"binary":"heartbeat","dataset":"elastic_agent.heartbeat","id":"synthetics/browser-default","type":"synthetics/browser"},"log":{"source":"synthetics/browser-default"},"service.name":"heartbeat","ecs.version":"1.6.0","log.origin":{"file.line":170,"file.name":"synthexec/synthexec.go"},"ecs.version":"1.6.0"}

{"log.level":"warn","@timestamp":"2023-04-14T15:09:51.429Z","message":"Could not start command /tmp/elastic-synthetics-unzip-3786772281/node_modules/.bin/elastic-synthetics /tmp/elastic-synthetics-unzip-3786772281 --playwright-options {\"headless\":true,\"ignoreHTTPSErrors\":false} --screenshots on --throttling 5d/3u/20l --rich-events --match MyApp Dev --params {\"dev\":{\"id\":\"179aa0f6-7d83-4980-aacf-ee932e7df83b\",\"journey\":\"MyApp Dev\",\"url\":\"https://myapp-dev.example.com/\",\"validation\":\"MyCompany\"},\"prod\":{\"id\":\"c95acd45-b81b-4473-8f6a-fa98f3b175cc\",\"journey\":\"MyApp Prod\",\"url\":\"https://myapp-dev.example.com/\",\"validation\":\"MyCompany\"}} --outfd 3: fork/exec /tmp/elastic-synthetics-unzip-3786772281/node_modules/.bin/elastic-synthetics: permission denied","component":{"binary":"heartbeat","dataset":"elastic_agent.heartbeat","id":"synthetics/browser-default","type":"synthetics/browser"},"log":{"source":"synthetics/browser-default"},"log.origin":{"file.line":251,"file.name":"synthexec/synthexec.go"},"service.name":"heartbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

/tmp/elastic-synthetics-unzip-3786772281/node_modules/.bin/ contains the following:


lrwxrwxrwx 1 root root 34 Apr 14 15:09 elastic-synthetics -> ../@elastic/synthetics/dist/cli.js

lrwxrwxrwx 1 root root 34 Apr 14 15:09 synthetics -> ../@elastic/synthetics/dist/cli.js

This resolves to the directory /tmp/elastic-synthetics-unzip-3786772281/node_modules/@elastic/synthetics/dist, which contains:


-rwxrwx--- 1 elastic-agent elastic-agent 8479 Mar 28 04:07 cli.js*

This has 0770 perms which should be executable for user/group and contains the following shebang:


#!/usr/bin/env node

...so I'm uncertain what is causing the permission denied.

Experiencing the same issue.
Stuck in pending state with agent log showing "Could not start command /tmp/elastic-synthetics-unzip... : permission denied"

It sounds like you're using zip monitors which have been deprecated for some time now in favor of the more powerful and featureful project monitors. Have you tried those? They should operate more smoothly and make deployment easier.

I don't understand. These ARE project monitors, built in a repository, pushed to our Kibana server, and attempting to run in a private location.

As a prototype, we created a browser-based monitor through Kibana UI. It's been running in that private location without incident. But when we push tests in a project monitor repo, they encounter this permission error.

1 Like

I apologize, I misread the logfile.

I suspect there's something special about how the container is being launched, I currently have a private location running with that same exact version of the elastic-agent-complete image successfully.

How are you running the container? Are you running it under a special user or mounting any volumes? Is there a customized seccomp policy? I'm looking for anything out of the ordinary that could cause those perms errors.

This is our agent manifest:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: elastic-agent
namespace: elasticsearch
spec:
version: 8.7.0
image: docker.elastic.co/beats/elastic-agent-complete:8.7.0
kibanaRef:
name: kibana
fleetServerRef:
name: fleet-server
mode: fleet
deployment:
replicas: 3
podTemplate:
spec:
securityContext:
runAsUser: 0
capabilities:
add: ["NET_RAW", "SETUID"]
containers:
- name: agent
resources:
limits:
memory: 4Gi
cpu: 3
requests:
memory: 4Gi
cpu: 3

I'm not using zip monitors. I'm using project monitors developed with the Node.js @elastic/synthetics library from a project initialized with npx @elastic/synthetics init .

NP

I suspect there's something special about how the container is being launched, I currently have a private location running with that same exact version of the elastic-agent-complete image successfully.

Just to confirm: Scripted browser tests deployed from a project using npm run push are functioning as expected, correct?

How are you running the container? Are you running it under a special user or mounting any volumes? Is there a customized seccomp policy? I'm looking for anything out of the ordinary that could cause those perms errors.

I'm launching using the Elastic Operator. I do have a few additional volumes mounted, in order to add self-signed certificates to the pod at startup, using a customized version of docker-entrypoint that also fixes the issue with elastic-agent that occasionally leaves lockfiles in place when a pod stops.

My Kubernetes manifest is:

---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-synthetics
spec:
  deployment:
    podTemplate:
      metadata:
        annotations:
          sidecar.istio.io/inject: "false"
        labels:
          env: dev
      spec:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: agent.k8s.elastic.co/name
                    operator: In
                    values:
                    - elastic-synthetics
                topologyKey: kubernetes.io/hostname
              weight: 100
        containers:
        - env:
          - name: CONFIG_PATH
            value: /usr/share/elastic-agent
          - name: ELASTIC_AGENT_TAGS
            valueFrom:
              fieldRef:
                fieldPath: metadata.labels['env']
          - name: ENV
            valueFrom:
              fieldRef:
                fieldPath: metadata.labels['env']
          - name: HOST_IP
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          envFrom:
          - configMapRef:
              name: elastic-synthetics-env
          - secretRef:
              name: elastic-synthetics-secret
          name: agent
          resources:
            limits:
              cpu: 1000m
              memory: 2Gi
            requests:
              cpu: 1000m
              memory: 2Gi
          securityContext:
            runAsUser: 0
          volumeMounts:
          - mountPath: /usr/local/share/ca-certificates
            name: certs
            readOnly: true
          - mountPath: /usr/local/bin
            name: config
            readOnly: true
          - mountPath: /tmp/nssdb
            name: nssdb
            readOnly: true
        priorityClassName: system-cluster-critical
        securityContext:
          runAsUser: 0
        volumes:
        - name: certs
          secret:
            secretName: athene-certs-secret
        - configMap:
            defaultMode: 493
            name: elastic-synthetics-config
          name: config
        - name: nssdb
          secret:
            defaultMode: 438
            secretName: elastic-synthetics-nssdb-secret
    replicas: 1
    strategy:
      rollingUpdate:
        maxUnavailable: 10%
      type: RollingUpdate
  mode: fleet
  version: 8.7.0

Edit

This is a simple browser test that I'm using and which is stuck in pending. It succeeds when run locally.

import { journey, step, monitor, expect } from '@elastic/synthetics';

journey('Google Test', ({ page, params }) => {
  monitor.use({
    id: '2252824d-ba39-454e-8ebd-d749de23279f',
    schedule: 10,
  });

  step('Open https://google.com', async () => {
    await page.goto('https://google.com');
    await expect(page.getByText('Google')).toBeTruthy();
  });
});

I vaguely remember when I first read that I needed to use elastic-agent-complete that there were specific environment variables that needed to be set for synthetics to work properly. However, I haven't been able to find that reference again. Am I remembering correctly? If so, what are these variables and what should the values be so that I can confirm that I have them set?

Updates

Update #1

When I create a multi-step journey in the UI and paste this script in, it fails, but it does attempt to run.

Update #2

I figured out what the issue was and it now runs as expected when I create a multi-step journey with the above script in the UI. Still not working when pushed from CLI.

I found the following in the v8.7 documentation:

Synthetic tests cannot run under the root user. Refer to Synthetics Fleet Quickstart for more information.

I have to wonder whether this is the issue. When I look at the processes running in the container, they're all running as root. Are all synthetics test pushed through the UI automagically run as elastic-agent? OTOH, if this were the issue, I would expect the lightweight tests I'm pushing via @elastic/synthetics to also fail.

@Andrew_Cholakian1 - Since your deployment is working as expected, can you provide any insight?

When I attempt to deploy a browser test to an on-prem node (e.g., an Elastic Agent deployed to a Linux server, rather than to an elastic-agent-complete container), I get the following error:

I'm also stuck into the same error. Browser monitors run correctly when created through Kibana but they don't when created using project monitors.

I tried changing the user of my k8s container to "elastic-agent" as described in the docs but it gives me other permission errors while starting up the container.

Same here.

Sorry for the delay here, trying to balance a number of priorities.

@tf4 what are the errors you're getting?

We're probably a week or so out from being able to dedicate some time to look into this unfortunately.

2 Likes

@Andrew_Cholakian1 I'm getting exactly the same error that @DougR mentioned in the beginning of this thread. BTW I'm using ECK and elastic-agent-complete:8.7.0 as well.

When I change the user to elastic-agent (runAsUser: 1000) I keep getting the following error and the container doesn't start.

cp: cannot create regular file '/usr/local/share/ca-certificates/ca.crt': Permission denied

I know this is related to the default volume mounts of the agent but I believe I shouldn't change anything in those volumes.

In my last investigation I checked the permissions of the folders inside /tmp and discovered that my monitors are being copied to the agent with permissions to the root user only. I suspect this is the reason why the elastic-agent user can't execute that .bin/elastic-synthetics command.

Hi @DougR, @tf4,

Elastic agent cannot run as non-root user, it's a known issue for ECK deployments. Any errors you get when changing the user are probably related to that.

Could you check under what user and capabilities heartbeat process is running? What does the env variable BEAT_SETUID_AS evaluate to inside the container?

We have mechanisms in place to prevent running browser monitors as a root user which are probably interfering here. FYI, we have introduced a new approach on 8.7.1.

Hi @emilioalvap!

Thanks for the info about running as non-root user.

As you can see in the output below, there are two heartbeat processes running as root.

root       924  0.1  0.5 1503900 88436 ?       Sl   13:22   0:01 /usr/share/elastic-agent/data/elastic-agent-10dc6a/components/heartbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E management.restart_on_output_change=true -E logging.level=info -E logging.to_stderr=true -E gc_percent=${HEARTBEAT_GOGC:100} -E http.enabled=true -E http.host=unix:///usr/share/elastic-agent/state/data/tmp/synthetics-http-default.sock -E path.data=/usr/share/elastic-agent/state/data/run/synthetics/http-default
root       942  0.2  0.5 2565076 89248 ?       Sl   13:22   0:01 /usr/share/elastic-agent/data/elastic-agent-10dc6a/components/heartbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E management.restart_on_output_change=true -E logging.level=info -E logging.to_stderr=true -E gc_percent=${HEARTBEAT_GOGC:100} -E http.enabled=true -E http.host=unix:///usr/share/elastic-agent/state/data/tmp/synthetics-browser-default.sock -E path.data=/usr/share/elastic-agent/state/data/run/synthetics/browser-default

Capabilities for both processes are the same:

cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+ep

That env var you asked has the following value: BEAT_SETUID_AS=elastic-agent.

I updated the agent to 8.7.1 but the problem persists.

Thanks for the info @tf4, I reviewed the scenario again and it turns out your analysis was right on target.

We have had issues with around mitigations in place to prevent running browser monitors under root. A new approach was implemented on 8.7.0 (not 8.7.1, as I thought) which impacts push monitors and other deprecated types. Since we generally recommend not running containers as root, this issue will impact ECK users mostly.

I've raised an issue here to track.

1 Like