Elastic Agent conditions-based autodiscover doesn't pick up newly-scheduled pods/containers

DougR · February 10, 2023, 2:06pm

I am currently using Elastic Cloud, v8.6.1, with Elastic Agent Standalone v8.6.0 deployed to EKS, running Kubernetes v1.22.16 in our non-production cluster and v1.21.14 in our production cluster (to be updated this weekend to 1.22.14). I've also confirmed this issue with Elastic Agent v8.6.1 in both EKS clusters.

Since upgrading to Elastic Agent v8.6.x, I've started using condition-based autodiscovery for Kubernetes pods in Elastic Agent. When pod logs rotate, they use a copyrotate strategy. Elastic Agent continues ingesting the logs as expected when this happens. However, when a new pod starts, Elastic Agent does not appear to start monitoring until the agent is restarted. This includes when:

A new deployment happens on a node for the first time (i.e., no prior instances of that deployment were running).
A pod restarts on the same node.
A pod stops on one node and a new instance starts on another node (i.e., when a pod "switches nodes").

I have also tested this with filebeat-8.6.1 deployed to my cluster using the older-style filebeat.autodiscover.providers hints-based autodiscovery, and it does not appear to exhibit the same behavior.

Here is a sample input configuration, taken from a pod for which this is particularly noticeable, since it generally terminates after about two hours and a new instance is scheduled to another node:

---
inputs:
  - id: 'filestream-myapp-0d9ea285-223f-4b6a-9207-9432b0eac168'
    name: 'filestream-myapp'
    type: 'filestream'
    use_output: 'default'
    data_stream:
      namespace: 'default'
    streams:
      # mycontainer
      - id: 'logs-${kubernetes.pod.name}-${kubernetes.container.id}'
        condition: '${kubernetes.container.name} == "mycontainer"'
        data_stream:
          dataset: 'myapp.log'
          type: 'logs'
        parsers:
          - container:
              stream: 'all'
              format: 'auto'
          - ndjson:
              expand_keys: true
              ignore_decoding_error: true
              overwrite_keys: true
              target: ''
        paths: ['/var/log/containers/*${kubernetes.container.id}.log']
        pipeline: 'logs-myapp.log'
        processors:
          - add_locale:
              format: 'offset'
        prospector.scanner.symlinks: true
        tags: ['myapp']

I do not currently have providers.kubernetes.hints.enabled: true set (the documentation for conditions-based autodiscovery does not require it), but I have tested it both ways with the same result.

What appears to be happening is that Elastic Agent isn't tracking the pod scheduler, so it doesn't pick up new containers when they're scheduled. It seems that there are three possibilities:

There's a bug in Elastic Agent.
I have an issue in my Elastic Agent config.

I have the wrong permissions set in my ClusterRole (see below):

---
apiVersion: 'rbac.authorization.k8s.io/v1'
kind: 'ClusterRole'
metadata:
  name: 'elastic-agent'
rules:
  - apiGroups: ['']
    resources:
      - 'configmaps'
      - 'events'
      - 'namespaces'
      - 'nodes'
      - 'pods'
      - 'services'

      # Needed for cloudbeat
      - 'persistentvolumeclaims'
      - 'persistentvolumes'
      - 'serviceaccounts'

    verbs: ['get', 'watch', 'list']

  - apiGroups: ['']
    resources: ['nodes/stats']
    verbs: ['get']

  - apiGroups: ['apps']
    resources:
      - 'daemonsets'
      - 'deployments'
      - 'replicasets'
      - 'statefulsets'
    verbs: ['get', 'watch', 'list']

  - apiGroups: ['batch']
    resources:
      - 'cronjobs'
      - 'jobs'
    verbs: ['get', 'watch', 'list']

  - apiGroups: ['coordination.k8s.io']
    resources: ['leases']
    verbs: ['get', 'create', 'update']

  - apiGroups: ['extensions']
    resources: ['replicasets']
    verbs: ['get', 'watch', 'list']

  - apiGroups: ['storage.k8s.io']
    resources:
      - 'storageclasses'
    verbs: ['get', 'watch', 'list']

  # Needed for cloudbeat
  - apiGroups: ['rbac.authorization.k8s.io']
    resources:
      - 'clusterrolebindings'
      - 'clusterroles'
      - 'rolebindings'
      - 'roles'
    verbs: ['get', 'watch', 'list']

  - apiGroups: ['policy']
    resources: ['podsecuritypolicies']
    verbs: ['get', 'watch', 'list']

  # Needed for apiserver
  - nonResourceURLs: ['/metrics']
    verbs: ['get']

clewo · February 10, 2023, 3:23pm

Hey DougR,

I noticed the same issue in our environment as well. --> Elastic-Agent autodiscovery for Kubernetes pod logs not working after upgrading to 8.6.x

So I think we both came across the same problem. Good to know it's nothing that just occured in our environment. Our configs look quite similar.

In my case the same agent policy works fine with Elastic Agent 8.5.3. Therefore I'm currently thinking it's a bug in the new version. Would be nice if somebody could confirm, if it's a bug and not a config error.

DougR · February 13, 2023, 8:28pm

Looks like we found the issue at about the same time. I'm going to try to downgrade to 8.5.3 again tomorrow and see what happens (I may not have given it enough time).

Either way, it doesn't appear that there's a bug open on this yet.

DougR · February 14, 2023, 1:54pm

After downgrading to 8.5.3, I let it run overnight, and everything is ingesting as inspected. It doesn't look as if any bugs are open on this, so I've submitted a bug report:

Elastic Agent 8.6.x standalone deployment in Kubernetes doesn't start monitoring new pods until agent restart · Issue #2269 · elastic/elastic-agent · GitHub

clewo · February 15, 2023, 11:26am

Thanks!

DougR · February 15, 2023, 5:18pm

I've observed a new pod instance spinning up on a different node for the input I posted above, and logs are being ingested, as expected.

system · March 15, 2023, 5:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pod container logs stop randomly Elastic Agent	6	1446	May 8, 2023
Elastic-Agent autodiscovery for Kubernetes pod logs not working after upgrading to 8.6.x Elastic Agent filebeat	2	642	March 16, 2023
[Fleet Agent] Cannot get kubernetes autodiscovery to work Elastic Cloud on Kubernetes (ECK) docker	2	551	October 22, 2021
Metricbeat autodiscovery kubernetes Beats metricbeat	22	1904	October 30, 2018
Filebeat stops sending logs after kubernetes deployment restart Beats filebeat	10	2999	September 14, 2018

Elastic Agent conditions-based autodiscover doesn't pick up newly-scheduled pods/containers

Related topics