Elastic Agent Standalone Kubernetes ICMP Check; Heartbeat won't Start

Hi All,

I'm trying to setup a standalone Elastic Agent config on Kubernetes, but am running into an issue where Heartbeat doesn't start, and eventually enters a failed state.

Elastic Agent: 7.16.2
ECK: 1.9.1
Elasticsearch: 7.16.2

Kubernetes config:

---
apiVersion: v1
kind: Secret
metadata:
  name: elastic-agent-test-config
  namespace: elastic-prod
stringData:
  agent.yml: |-
    inputs:
      - id: 5a394a8f-211e-41bf-9f4e-e9fe29591af5
        name: ping-test
        revision: 2
        type: synthetics/icmp
        use_output: default
        meta:
          package:
            name: synthetics
            version: 0.5.0
        data_stream:
          namespace: dev
        streams:
          - id: synthetics/icmp-icmp-5a394a8f-211e-41bf-9f4e-e9fe29591af5
            name: ping-test
            type: icmp
            data_stream:
              dataset: icmp
              type: synthetics
            schedule: '@every 10s'
            wait: 1s
            hosts: host.example.com
            timeout: 8s
            tags:
              - test
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent-test
  namespace: elastic-prod
spec:
  configRef:
    secretName: elastic-agent-test-config
  deployment:
    podTemplate:
      spec:
        containers:
        - name: agent
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 0
  elasticsearchRefs:
  - name: es-prod
    namespace: elastic-prod
  fleetServerRef:
    name: ""
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
  kibanaRef:
    name: ""
  version: 7.16.2

Logs of when the agent starts up:

2022-02-01T21:05:23.196Z INFO application/application.go:67 Detecting execution mode
2022-02-01T21:05:23.197Z INFO application/application.go:76 Agent is managed locally
2022-02-01T21:05:23.197Z INFO capabilities/capabilities.go:59 capabilities file not found in /usr/share/elastic-agent/state/capabilities.yml
2022-02-01T21:05:26.682Z INFO [composable.providers.docker] docker/docker.go:43 Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2022-02-01T21:05:26.682Z INFO [api] api/server.go:62 Starting stats endpoint
2022-02-01T21:05:26.682Z INFO application/local_mode.go:168 Agent is starting
2022-02-01T21:05:26.683Z INFO [api] api/server.go:64 Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2022-02-01T21:05:26.683Z INFO application/local_mode.go:178 Agent is stopped
2022-02-01T21:05:26.684Z INFO application/periodic.go:79 Configuration changes detected
2022-02-01T21:05:26.781Z INFO stateresolver/stateresolver.go:48 New State ID is AsbxjRZj
2022-02-01T21:05:26.781Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 2 step(s)
2022-02-01T21:05:34.179Z INFO log/reporter.go:40 2022-02-01T21:05:34Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:35.881Z INFO log/reporter.go:40 2022-02-01T21:05:35Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to RESTARTING: exited with code: 1 - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:35.881Z INFO log/reporter.go:40 2022-02-01T21:05:35Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:35.881Z INFO log/reporter.go:40 2022-02-01T21:05:35Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:50.194Z INFO log/reporter.go:40 2022-02-01T21:05:50Z - message: Application: filebeat--7.16.2--36643631373035623733363936343635[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:05:59.080Z INFO log/reporter.go:40 2022-02-01T21:05:59Z - message: Application: filebeat--7.16.2--36643631373035623733363936343635[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
2022-02-01T21:06:07.787Z INFO log/reporter.go:40 2022-02-01T21:06:07Z - message: Application: metricbeat--7.16.2--36643631373035623733363936343635[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-01T21:06:07.876Z INFO stateresolver/stateresolver.go:66 Updating internal state
2022-02-01T21:06:07.876Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:07.879Z INFO stateresolver/stateresolver.go:48 New State ID is AsbxjRZj
2022-02-01T21:06:07.879Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 0 step(s)
2022-02-01T21:06:07.879Z INFO stateresolver/stateresolver.go:66 Updating internal state
2022-02-01T21:06:13.687Z INFO log/reporter.go:40 2022-02-01T21:06:13Z - message: Application: metricbeat--7.16.2--36643631373035623733363936343635[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
2022-02-01T21:06:17.877Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:27.878Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:36.693Z WARN status/reporter.go:236 Elastic Agent status changed to: 'degraded'
2022-02-01T21:06:36.693Z INFO log/reporter.go:40 2022-02-01T21:06:36Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'
2022-02-01T21:06:37.879Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:47.879Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:06:57.880Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:07:07.881Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:07:17.882Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:07:27.883Z INFO application/periodic.go:101 No configuration change
2022-02-01T21:07:36.702Z ERROR status/reporter.go:236 Elastic Agent status changed to: 'error'
2022-02-01T21:07:36.702Z ERROR log/reporter.go:36 2022-02-01T21:07:36Z - message: Application: heartbeat--7.16.2[649ef4d9-0207-4392-8529-7a6e0d2e517d]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'

There are no logs under: /usr/share/elastic-agent/state/data/logs/default/, so I think the issue occurs before Heartbeat can actually ever start.

Debug logs didn't appear to be of much help either.

Thanks for the report @BenB196 , and sorry for the issues.

There are two issues here:

  1. Our error reporting in this situation is poor
  2. ICMP isn't working in this situation.

We actually recently fixed the issue where ICMP issues could be fatal to heartbeat in [Heartbeat] Defer monitor / ICMP errors to monitor runtime / ES by andrewvc · Pull Request #29413 · elastic/beats · GitHub . Would you be able to try this out on 7.17.0 and let us know which error you're getting?

As a note, the errors (should) show up in the Uptime UI now, instead of requiring you to dig through logs.

@Andrew_Cholakian1 thanks for the information. I did see that mention in the release notes yesterday and thought it might be somewhat related. I'll take a look at testing with 7.17 to see if that solves the issue/shows any errors.

Hi @BenB196,

After further investigation, we have logged an issue to address this scenario in [Heartbeat][Agent] Heartbeat won't start when executed as root. It should also apply to 7.17.0 and later versions.

Until fixed, as a workaround, you can either run the agent with the in-built user (elastic-agent uid 1000) or specify the environment variable BEAT_SETUID_AS: "root" in your container config.

Thanks @emilioalvap for the update, the reason why I think it needs to be root in the first place is because when starting as non-root, I ran into: [elastic-agent][heartbeat] Heartbeat binary should have setcap privs for ICMP ping · Issue #27651 · elastic/beats · GitHub, as I'm attempting to use the ICMP probe, and I don't think besides running as root, there is another workaround for running on Kubernetes currently.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.