Elastic Agent ECK 7.17.0 Work Around doesn't Work Correctly

Hi All,

There is known issue with Elastic Agent managed by ECK where it doesn't work for 7.17.0.

There is a provided workaround: https://github.com/elastic/cloud-on-k8s/issues/5323 (and https://github.com/elastic/cloud-on-k8s/pull/5326/files)

But this doesn't appear to work correctly. Example

7.16.3 - Startup Logs

2022-02-04T18:32:41.557Z INFO application/application.go:67 Detecting execution mode
2022-02-04T18:32:41.557Z INFO application/application.go:76 Agent is managed locally
2022-02-04T18:32:41.557Z INFO capabilities/capabilities.go:59 capabilities file not found in /usr/share/elastic-agent/state/capabilities.yml
2022-02-04T18:32:43.351Z INFO [composable.providers.docker] docker/docker.go:43 Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2022-02-04T18:32:43.352Z INFO [api] api/server.go:62 Starting stats endpoint
2022-02-04T18:32:43.352Z INFO application/local_mode.go:168 Agent is starting
2022-02-04T18:32:43.352Z INFO [api] api/server.go:64 Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2022-02-04T18:32:43.353Z INFO application/local_mode.go:178 Agent is stopped
2022-02-04T18:32:43.353Z INFO application/periodic.go:79 Configuration changes detected
2022-02-04T18:32:43.359Z INFO stateresolver/stateresolver.go:48 New State ID is I1pGg08w
2022-02-04T18:32:43.359Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 2 step(s)
2022-02-04T18:32:51.837Z INFO log/reporter.go:40 2022-02-04T18:32:51Z - message: Application: heartbeat--7.16.3[e07f11bb-01dc-4270-91a6-ee08f369505c]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-04T18:32:54.542Z INFO log/reporter.go:40 2022-02-04T18:32:54Z - message: Application: heartbeat--7.16.3[e07f11bb-01dc-4270-91a6-ee08f369505c]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
2022-02-04T18:33:08.843Z INFO log/reporter.go:40 2022-02-04T18:33:08Z - message: Application: filebeat--7.16.3--36643631373035623733363936343635[e07f11bb-01dc-4270-91a6-ee08f369505c]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-04T18:33:18.241Z INFO log/reporter.go:40 2022-02-04T18:33:18Z - message: Application: filebeat--7.16.3--36643631373035623733363936343635[e07f11bb-01dc-4270-91a6-ee08f369505c]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
2022-02-04T18:33:29.141Z INFO log/reporter.go:40 2022-02-04T18:33:29Z - message: Application: metricbeat--7.16.3--36643631373035623733363936343635[e07f11bb-01dc-4270-91a6-ee08f369505c]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-04T18:33:29.143Z INFO stateresolver/stateresolver.go:66 Updating internal state
2022-02-04T18:33:29.143Z INFO application/periodic.go:101 No configuration change
2022-02-04T18:33:29.146Z INFO stateresolver/stateresolver.go:48 New State ID is I1pGg08w
2022-02-04T18:33:29.146Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 0 step(s)
2022-02-04T18:33:29.146Z INFO stateresolver/stateresolver.go:66 Updating internal state
2022-02-04T18:33:34.951Z INFO log/reporter.go:40 2022-02-04T18:33:34Z - message: Application: metricbeat--7.16.3--36643631373035623733363936343635[e07f11bb-01dc-4270-91a6-ee08f369505c]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'

7.17.0

Updating certificates in /etc/ssl/certs...
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
2022-02-04T18:08:20.944Z INFO application/application.go:67 Detecting execution mode
2022-02-04T18:08:20.944Z INFO application/application.go:76 Agent is managed locally
2022-02-04T18:08:20.944Z INFO capabilities/capabilities.go:59 capabilities file not found in /usr/share/elastic-agent/state/capabilities.yml
2022-02-04T18:08:21.753Z INFO [composable.providers.docker] docker/docker.go:43 Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2022-02-04T18:08:21.754Z INFO [api] api/server.go:62 Starting stats endpoint
2022-02-04T18:08:21.754Z INFO application/local_mode.go:168 Agent is starting
2022-02-04T18:08:21.754Z INFO [api] api/server.go:64 Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2022-02-04T18:08:21.754Z INFO application/local_mode.go:178 Agent is stopped
2022-02-04T18:08:21.755Z INFO application/periodic.go:79 Configuration changes detected
2022-02-04T18:08:21.762Z INFO stateresolver/stateresolver.go:48 New State ID is 3xwWCltv
2022-02-04T18:08:21.762Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 2 step(s)
2022-02-04T18:08:35.835Z INFO log/reporter.go:40 2022-02-04T18:08:35Z - message: Application: metricbeat--7.17.0[ab5ebc20-bd2b-4f17-ad22-11dee796e22a]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-04T18:08:43.538Z INFO log/reporter.go:40 2022-02-04T18:08:43Z - message: Application: metricbeat--7.17.0[ab5ebc20-bd2b-4f17-ad22-11dee796e22a]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
2022-02-04T18:08:51.036Z INFO log/reporter.go:40 2022-02-04T18:08:51Z - message: Application: filebeat--7.17.0--36643631373035623733363936343635[ab5ebc20-bd2b-4f17-ad22-11dee796e22a]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-04T18:08:55.044Z INFO operation/operator.go:284 operation 'operation-install' skipped for metricbeat.7.17.0
2022-02-04T18:08:56.039Z INFO log/reporter.go:40 2022-02-04T18:08:56Z - message: Application: metricbeat--7.17.0--36643631373035623733363936343635[ab5ebc20-bd2b-4f17-ad22-11dee796e22a]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
2022-02-04T18:08:56.041Z INFO stateresolver/stateresolver.go:66 Updating internal state
2022-02-04T18:08:56.041Z INFO application/periodic.go:101 No configuration change
2022-02-04T18:08:56.137Z INFO stateresolver/stateresolver.go:48 New State ID is 3xwWCltv
2022-02-04T18:08:56.138Z INFO stateresolver/stateresolver.go:49 Converging state requires execution of 0 step(s)
2022-02-04T18:08:56.138Z INFO stateresolver/stateresolver.go:66 Updating internal state
2022-02-04T18:08:58.939Z INFO log/reporter.go:40 2022-02-04T18:08:58Z - message: Application: filebeat--7.17.0--36643631373035623733363936343635[ab5ebc20-bd2b-4f17-ad22-11dee796e22a]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
2022-02-04T18:09:03.241Z INFO log/reporter.go:40 2022-02-04T18:09:03Z - message: Application: metricbeat--7.17.0--36643631373035623733363936343635[ab5ebc20-bd2b-4f17-ad22-11dee796e22a]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'

The agent for 7.17.0 uses a completely different config than it does on 7.16.3

In 7.16.3 the proper config is loaded (note: Heartbeat in logs)

In 7.17.0 a default config is loaded (note: no Heartbeat in logs)

Config of Agent:

7.16.3

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent-test
  namespace: elastic-prod
spec:
  configRef:
    secretName: elastic-agent-test-config
  deployment:
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        containers:
        - env:
          - name: BEAT_SETUID_AS
            value: root
          name: agent
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 0
    strategy: {}
  elasticsearchRefs:
  - name: es-prod
    namespace: elastic-prod
  fleetServerRef:
    name: ""
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
  kibanaRef:
    name: ""
  version: 7.16.3

7.17.0

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent-test
  namespace: elastic-prod
spec:
  configRef:
    secretName: elastic-agent-test-config
  deployment:
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        containers:
        - command:
          - bash
          - -c 
          - |
            #!/usr/bin/env bash
            set -e
            if [[ -f /mnt/elastic-internal/elasticsearch-association/<agent-ns>/<es-name>/certs/ca.crt ]]; then
              cp /mnt/elastic-internal/elasticsearch-association/<agent-ns>/<es-name>/certs/ca.crt /usr/local/share/ca-certificates
              update-ca-certificates
            fi
            /usr/bin/tini -- /usr/local/bin/docker-entrypoint -e
          env:
          - name: BEAT_SETUID_AS
            value: root
          name: agent
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 0
    strategy: {}
  elasticsearchRefs:
  - name: es-prod
    namespace: elastic-prod
  fleetServerRef:
    name: ""
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
  kibanaRef:
    name: ""
  version: 7.17.0

Note: I did replace both instances of <agent-ns>/<es-name> with the correct path.

I tried to reproduce your problem but was not able to. I was able to deploy Elastic Agent/Fleet Server and installed a simple uptime check via the Kibana UI with success. Can you maybe share more about the Uptime/Heartbeat configuration you are trying to deploy?

Hi @pebrc,

Here is the full Kubernetes config for the agent:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent-test
  namespace: elastic-prod
spec:
  configRef:
    secretName: elastic-agent-test-config
  deployment:
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        containers:
        - env:
          - name: BEAT_SETUID_AS
            value: root
          name: agent
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 0
    strategy: {}
  elasticsearchRefs:
  - name: es-prod
    namespace: elastic-prod
  fleetServerRef:
    name: ""
  http:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
  kibanaRef:
    name: ""
  version: 7.16.3

Secret with the Agent config

apiVersion: v1
stringData:
  agent.yml: |- 
    inputs:
      - id: ping-test
        name: ping-test
        revision: 2
        type: synthetics/icmp
        use_output: default
        meta:
          package:
            name: synthetics
            version: 0.5.0
        data_stream:
          namespace: dev
        streams:
          - id: ping-test
            name: ping-test
            type: icmp
            data_stream:
              dataset: icmp
              type: synthetics
            schedule: '@every 10s'
            wait: 1s
            hosts: 
            - example.com
            - example2.com
            timeout: 8s
            tags:
              - test
kind: Secret
metadata:
  name: elastic-agent-test-config
  namespace: elastic-prod
type: Opaque

The above 2 configs work with 7.16.3, but do not work on 7.17.0 with the workaround applied.

This actually turns out to be a misunderstanding of the issue + workaround.

I had initially thought that all instances of Elastic Agent on ECK needed this. However, this is incorrect, only the Fleet managed ones needed the workaround.

A question though regarding the work around:

In 7.16.3 or earlier, the Fleet managed command started with:

/usr/bin/env

However, the workaround for 7.17.0 omits this first line.

Is this intended, or is this line not needed?

I think this was just an oversight.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.