I am currently using Elastic Cloud, v8.6.1, with Elastic Agent Standalone v8.6.0 deployed to EKS, running Kubernetes v1.22.16 in our non-production cluster and v1.21.14 in our production cluster (to be updated this weekend to 1.22.14). I've also confirmed this issue with Elastic Agent v8.6.1 in both EKS clusters.
Since upgrading to Elastic Agent v8.6.x, I've started using condition-based autodiscovery for Kubernetes pods in Elastic Agent. When pod logs rotate, they use a copyrotate
strategy. Elastic Agent continues ingesting the logs as expected when this happens. However, when a new pod starts, Elastic Agent does not appear to start monitoring until the agent is restarted. This includes when:
- A new deployment happens on a node for the first time (i.e., no prior instances of that deployment were running).
- A pod restarts on the same node.
- A pod stops on one node and a new instance starts on another node (i.e., when a pod "switches nodes").
I have also tested this with filebeat-8.6.1
deployed to my cluster using the older-style filebeat.autodiscover.providers
hints-based autodiscovery, and it does not appear to exhibit the same behavior.
Here is a sample input configuration, taken from a pod for which this is particularly noticeable, since it generally terminates after about two hours and a new instance is scheduled to another node:
---
inputs:
- id: 'filestream-myapp-0d9ea285-223f-4b6a-9207-9432b0eac168'
name: 'filestream-myapp'
type: 'filestream'
use_output: 'default'
data_stream:
namespace: 'default'
streams:
# mycontainer
- id: 'logs-${kubernetes.pod.name}-${kubernetes.container.id}'
condition: '${kubernetes.container.name} == "mycontainer"'
data_stream:
dataset: 'myapp.log'
type: 'logs'
parsers:
- container:
stream: 'all'
format: 'auto'
- ndjson:
expand_keys: true
ignore_decoding_error: true
overwrite_keys: true
target: ''
paths: ['/var/log/containers/*${kubernetes.container.id}.log']
pipeline: 'logs-myapp.log'
processors:
- add_locale:
format: 'offset'
prospector.scanner.symlinks: true
tags: ['myapp']
I do not currently have providers.kubernetes.hints.enabled: true
set (the documentation for conditions-based autodiscovery does not require it), but I have tested it both ways with the same result.
What appears to be happening is that Elastic Agent isn't tracking the pod scheduler, so it doesn't pick up new containers when they're scheduled. It seems that there are three possibilities:
- There's a bug in Elastic Agent.
- I have an issue in my Elastic Agent config.
- I have the wrong permissions set in my
ClusterRole
(see below):--- apiVersion: 'rbac.authorization.k8s.io/v1' kind: 'ClusterRole' metadata: name: 'elastic-agent' rules: - apiGroups: [''] resources: - 'configmaps' - 'events' - 'namespaces' - 'nodes' - 'pods' - 'services' # Needed for cloudbeat - 'persistentvolumeclaims' - 'persistentvolumes' - 'serviceaccounts' verbs: ['get', 'watch', 'list'] - apiGroups: [''] resources: ['nodes/stats'] verbs: ['get'] - apiGroups: ['apps'] resources: - 'daemonsets' - 'deployments' - 'replicasets' - 'statefulsets' verbs: ['get', 'watch', 'list'] - apiGroups: ['batch'] resources: - 'cronjobs' - 'jobs' verbs: ['get', 'watch', 'list'] - apiGroups: ['coordination.k8s.io'] resources: ['leases'] verbs: ['get', 'create', 'update'] - apiGroups: ['extensions'] resources: ['replicasets'] verbs: ['get', 'watch', 'list'] - apiGroups: ['storage.k8s.io'] resources: - 'storageclasses' verbs: ['get', 'watch', 'list'] # Needed for cloudbeat - apiGroups: ['rbac.authorization.k8s.io'] resources: - 'clusterrolebindings' - 'clusterroles' - 'rolebindings' - 'roles' verbs: ['get', 'watch', 'list'] - apiGroups: ['policy'] resources: ['podsecuritypolicies'] verbs: ['get', 'watch', 'list'] # Needed for apiserver - nonResourceURLs: ['/metrics'] verbs: ['get']