Elastic Agent Standalone Kubernetes ICMP Check; Heartbeat won't Start

Hi All,

I'm trying to setup a standalone Elastic Agent config on Kubernetes, but am running into an issue where Heartbeat doesn't start, and eventually enters a failed state.

Elastic Agent: 7.16.2
ECK: 1.9.1
Elasticsearch: 7.16.2

Kubernetes config:

---
apiVersion: v1
kind: Secret
metadata:
  name: elastic-agent-test-config
  namespace: elastic-prod
stringData:
  agent.yml: |-
    inputs:
      - id: 5a394a8f-211e-41bf-9f4e-e9fe29591af5
        name: ping-test
        revision: 2
        type: synthetics/icmp
        use_output: default
        meta:
          package:
            name: synthetics
            version: 0.5.0
        data_stream:
          namespace: dev
        streams:
          - id: synthetics/icmp-icmp-5a394a8f-211e-41bf-9f4e-e9fe29591af5
            name: ping-test
            type: icmp
            data_stream:
              dataset: icmp
              type: synthetics
            schedule: '@every 10s'
            wait: 1s
            hosts: host.example.com
            timeout: 8s
            tags:
              - test
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent-test
  namespace: elastic-prod
spec:
  configRef:
    secretName: elastic-agent-test-config
  deployment:
    podTemplate:
      spec:
        containers:
        - name: agent
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 0
  elasticsearchRefs:
  - name: es-prod
    namespace: elastic-prod
  fleetServerRef:
    name: ""
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
  kibanaRef:
    name: ""
  version: 7.16.2

Logs of when the agent starts up:

2022-02-01T21:05:23.196Z INFO application/application.go:67 Detecting execution mode
2022-02-01T21:05:23.197Z INFO application/application.go:76 Agent is managed locally
2022-02-01T21:05:23.197Z INFO capabilities/capabilities.go:59 capabilities file not found in /usr/share/elastic-agent/state/capabilities.yml
2022-02-01T21:05:26.682Z INFO [composable.providers.docker] docker/docker.go:43 Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2022-02-01T21:05:26.682Z INFO [api] api/server.go:62 Starting stats endpoint
2022-02-01T21:05:26.682Z INFO application/local_mode.go:168 Agent is starting
2022-02-01T21:05:26.683Z INFO [api] api/server.go:64 Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2022-02-01T21:05:26.683Z INFO application/local_mode.go:178 Agent is stopped
2022-02-01T21:05:26.684Z INFO application/periodic.go:79 Configuration changes detected
2022-02-01T21:05:26.781Z INFO stateresolver/stateresolver.go:48 New State ID is AsbxjRZj
2022-02-01T21:05:26.781Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 2 step(s)
2022-02-01T21:05:34.179Z INFO log/reporter.go:40 2022-02-01T21:05:34Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:35.881Z INFO log/reporter.go:40 2022-02-01T21:05:35Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to RESTARTING: exited with code: 1 - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:35.881Z INFO log/reporter.go:40 2022-02-01T21:05:35Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:35.881Z INFO log/reporter.go:40 2022-02-01T21:05:35Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:50.194Z INFO log/reporter.go:40 2022-02-01T21:05:50Z - message: Application: filebeat--7.16.2--36643631373035623733363936343635[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:59.080Z INFO log/reporter.go:40 2022-02-01T21:05:59Z - message: Application: filebeat--7.16.2--36643631373035623733363936343635[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
2022-02-01T21:06:07.787Z INFO log/reporter.go:40 2022-02-01T21:06:07Z - message: Application: metricbeat--7.16.2--36643631373035623733363936343635[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:06:07.876Z INFO stateresolver/stateresolver.go:66 Updating internal state
2022-02-01T21:06:07.876Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:07.879Z INFO stateresolver/stateresolver.go:48 New State ID is AsbxjRZj
2022-02-01T21:06:07.879Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 0 step(s)
2022-02-01T21:06:07.879Z INFO stateresolver/stateresolver.go:66 Updating internal state
2022-02-01T21:06:13.687Z INFO log/reporter.go:40 2022-02-01T21:06:13Z - message: Application: metricbeat--7.16.2--36643631373035623733363936343635[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
2022-02-01T21:06:17.877Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:27.878Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:36.693Z WARN status/reporter.go:236 Elastic Agent status changed to: 'degraded'
2022-02-01T21:06:36.693Z INFO log/reporter.go:40 2022-02-01T21:06:36Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'
2022-02-01T21:06:37.879Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:47.879Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:57.880Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:07:07.881Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:07:17.882Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:07:27.883Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:07:36.702Z ERROR status/reporter.go:236 Elastic Agent status changed to: 'error'
2022-02-01T21:07:36.702Z ERROR log/reporter.go:36 2022-02-01T21:07:36Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'

There are no logs under: /usr/share/elastic-agent/state/data/logs/default/, so I think the issue occurs before Heartbeat can actually ever start.

Debug logs didn't appear to be of much help either.

Thanks for the report @BenB196 , and sorry for the issues.

There are two issues here:

  1. Our error reporting in this situation is poor
  2. ICMP isn't working in this situation.

We actually recently fixed the issue where ICMP issues could be fatal to heartbeat in [Heartbeat] Defer monitor / ICMP errors to monitor runtime / ES by andrewvc · Pull Request #29413 · elastic/beats · GitHub . Would you be able to try this out on 7.17.0 and let us know which error you're getting?

As a note, the errors (should) show up in the Uptime UI now, instead of requiring you to dig through logs.

@Andrew_Cholakian1 thanks for the information. I did see that mention in the release notes yesterday and thought it might be somewhat related. I'll take a look at testing with 7.17 to see if that solves the issue/shows any errors.

Hi @BenB196,

After further investigation, we have logged an issue to address this scenario in [Heartbeat][Agent] Heartbeat won't start when executed as root. It should also apply to 7.17.0 and later versions.

Until fixed, as a workaround, you can either run the agent with the in-built user (elastic-agent uid 1000) or specify the environment variable BEAT_SETUID_AS: "root" in your container config.

Thanks @emilioalvap for the update, the reason why I think it needs to be root in the first place is because when starting as non-root, I ran into: [elastic-agent][heartbeat] Heartbeat binary should have setcap privs for ICMP ping · Issue #27651 · elastic/beats · GitHub, as I'm attempting to use the ICMP probe, and I don't think besides running as root, there is another workaround for running on Kubernetes currently.