Hello,
It's been days me and a colleague trying to configure elastic-agent to collect kubernetes. I'll explain our stack after this:
Before:
- version of all elastic components: 8.11.4
- kube-state-metrics properly deployed
- environment : air-gapped, no internet connection to outside
- elastic deployed on premise
- kubernetes cluster deployed and pods working fine
- Before we had everything working fine, we had elastic agent to collect system metrics and logs from VMs, and metricbeat was deployed using ECK to collect kubernetes metrics.
After
We had to replace metricbeat deployed earlier with elastic-agent of integration type kubernetes, and in addition of the already running elastic-agent on VMs that collect system metrics, I don't know if it's the correct way or not to have 2 agents running: one for system and the second one for kubernetes, or only the elastic-agent on kubernetes is enough to gather both system and kubernetes metrics.
The problem is that we are getting many errors once the elastic-agent is starting, the error saying:
{"log.level":"error","@timestamp":"2024-08-16T13:59:57.201Z","message":"Error
dialing x509: certificate signed by unknown
authority","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"kubernetes/metrics-default","type":"kubernetes/metrics"},"log":{"source":"kubernetes/metrics-default"},"log.logger":"esclientleg","log.origin":{"file.line":38,"file.name":"transport/logging.go"},"service.name":"metricbeat","network":"tcp","address":"m090-sup-prd1:9200","ecs.version":"1.6.0","ecs.version":"1.6.0"}
sometimes we are getting an error of: DNS lookup failure m090-sup-prd1 no such host, and the weird part is when metricbeat was deployed it was able to access the elasticsearch host on m090-sup-prd1.
below the configuration of elastic-agent:
---
# For more information https://www.elastic.co/guide/en/fleet/current/running-on-kubernetes-managed-by-fleet.html
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: elastic-agent
namespace: kube-system
labels:
app: elastic-agent
spec:
selector:
matchLabels:
app: elastic-agent
template:
metadata:
labels:
app: elastic-agent
spec:
# Tolerations are needed to run Elastic Agent on Kubernetes control-plane nodes.
# Agents running on control-plane nodes collect metrics from the control plane components (scheduler, controller manager) of Kubernetes
# tolerations:
# - key: node-role.kubernetes.io/control-plane
# effect: NoSchedule
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
serviceAccountName: elastic-agent
hostNetwork: true
# 'hostPID: true' enables the Elastic Security integration to observe all process exec events on the host.
# Sharing the host process ID namespace gives visibility of all processes running on the same host.
hostPID: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: elastic-agent
image: nexus.forge-dc.cloudmi.minint.fr/rrf/elk/agent/elasticagent:8.11.4
env:
# Set to 1 for enrollment into Fleet server. If not set, Elastic Agent is run in standalone mode
- name: FLEET_ENROLL
value: "1"
# Set to true to communicate with Fleet with either insecure HTTP or unverified HTTPS
- name: FLEET_INSECURE
value: "true"
# Fleet Server URL to enroll the Elastic Agent into
# FLEET_URL can be found in Kibana, go to Management > Fleet > Settings
- name: FLEET_URL
value: "https://m090-sup-prd1:8220"
# Elasticsearch API key used to enroll Elastic Agents in Fleet (https://www.elastic.co/guide/en/fleet/current/fleet-enrollment-tokens.html#fleet-enrollment-tokens)
# If FLEET_ENROLLMENT_TOKEN is empty then KIBANA_HOST, KIBANA_FLEET_USERNAME, KIBANA_FLEET_PASSWORD are needed
- name: FLEET_ENROLLMENT_TOKEN
value: "RHVFVVVKRUJlQjRpdUo3V3U3Zl86emJLODdVWHJRZnFsczM1QXdRQURCdw=="
- name: KIBANA_HOST
value: "http://kibana:5601"
# The basic authentication username used to connect to Kibana and retrieve a service_token to enable Fleet
- name: KIBANA_FLEET_USERNAME
value: "elastic"
# The basic authentication password used to connect to Kibana and retrieve a service_token to enable Fleet
- name: KIBANA_FLEET_PASSWORD
value: "changeme"
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# The following ELASTIC_NETINFO:false variable will disable the netinfo.enabled option of add-host-metadata processor. This will remove fields host.ip and host.mac.
# For more info: https://www.elastic.co/guide/en/beats/metricbeat/current/add-host-metadata.html
- name: ELASTIC_NETINFO
value: "false"
securityContext:
runAsUser: 0
# The following capabilities are needed for 'Defend for containers' integration (cloud-defend)
# If you are using this integration, please uncomment these lines before applying.
#capabilities:
# add:
# - BPF # (since Linux 5.8) allows loading of BPF programs, create most map types, load BTF, iterate programs and maps.
# - PERFMON # (since Linux 5.8) allows attaching of BPF programs used for performance metrics and observability operations.
# - SYS_RESOURCE # Allow use of special resources or raising of resource limits. Used by 'Defend for Containers' to modify 'rlimit_memlock'
########################################################################################
# The following capabilities are needed for Universal Profiling.
# More fine graded capabilities are only available for newer Linux kernels.
# If you are using the Universal Profiling integration, please uncomment these lines before applying.
#procMount: "Unmasked"
#privileged: true
#capabilities:
# add:
# - SYS_ADMIN
resources:
limits:
memory: 700Mi
requests:
cpu: 100m
memory: 400Mi
volumeMounts:
- name: proc
mountPath: /hostfs/proc
readOnly: true
- name: cgroup
mountPath: /hostfs/sys/fs/cgroup
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
- name: etc-full
mountPath: /hostfs/etc
readOnly: true
- name: var-lib
mountPath: /hostfs/var/lib
readOnly: true
- name: etc-mid
mountPath: /etc/machine-id
readOnly: true
- name: sys-kernel-debug
mountPath: /sys/kernel/debug
- name: elastic-agent-state
mountPath: /usr/share/elastic-agent/state
# If you are using the Universal Profiling integration, please uncomment these lines before applying.
#- name: universal-profiling-cache
# mountPath: /var/cache/Elastic
volumes:
- name: proc
hostPath:
path: /proc
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
# The following volumes are needed for Cloud Security Posture integration (cloudbeat)
# If you are not using this integration, then these volumes and the corresponding
# mounts can be removed.
- name: etc-full
hostPath:
path: /etc
- name: var-lib
hostPath:
path: /var/lib
# Mount /etc/machine-id from the host to determine host ID
# Needed for Elastic Security integration
- name: etc-mid
hostPath:
path: /etc/machine-id
type: File
# Needed for 'Defend for containers' integration (cloud-defend) and Universal Profiling
# If you are not using one of these integrations, then these volumes and the corresponding
# mounts can be removed.
- name: sys-kernel-debug
hostPath:
path: /sys/kernel/debug
# Mount /var/lib/elastic-agent-managed/kube-system/state to store elastic-agent state
# Update 'kube-system' with the namespace of your agent installation
- name: elastic-agent-state
hostPath:
path: /var/lib/elastic-agent-managed/kube-system/state
type: DirectoryOrCreate
# Mount required for Universal Profiling.
# If you are using the Universal Profiling integration, please uncomment these lines before applying.
#- name: universal-profiling-cache
# hostPath:
# path: /var/cache/Elastic
# type: DirectoryOrCreate
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: elastic-agent
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: ClusterRole
name: elastic-agent
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: kube-system
name: elastic-agent
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: Role
name: elastic-agent
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: elastic-agent-kubeadm-config
namespace: kube-system
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: Role
name: elastic-agent-kubeadm-config
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: elastic-agent
labels:
k8s-app: elastic-agent
rules:
- apiGroups: [""]
resources:
- nodes
- namespaces
- events
- pods
- services
- configmaps
# Needed for cloudbeat
- serviceaccounts
- persistentvolumes
- persistentvolumeclaims
verbs: ["get", "list", "watch"]
# Enable this rule only if planing to use kubernetes_secrets provider
#- apiGroups: [""]
# resources:
# - secrets
# verbs: ["get"]
- apiGroups: ["extensions"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
- deployments
- replicasets
- daemonsets
verbs: ["get", "list", "watch"]
- apiGroups:
- ""
resources:
- nodes/stats
verbs:
- get
- apiGroups: [ "batch" ]
resources:
- jobs
- cronjobs
verbs: [ "get", "list", "watch" ]
# Needed for apiserver
- nonResourceURLs:
- "/metrics"
verbs:
- get
# Needed for cloudbeat
- apiGroups: ["rbac.authorization.k8s.io"]
resources:
- clusterrolebindings
- clusterroles
- rolebindings
- roles
verbs: ["get", "list", "watch"]
# Needed for cloudbeat
- apiGroups: ["policy"]
resources:
- podsecuritypolicies
verbs: ["get", "list", "watch"]
- apiGroups: [ "storage.k8s.io" ]
resources:
- storageclasses
verbs: [ "get", "list", "watch" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: elastic-agent
# Should be the namespace where elastic-agent is running
namespace: kube-system
labels:
k8s-app: elastic-agent
rules:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: elastic-agent-kubeadm-config
namespace: kube-system
labels:
k8s-app: elastic-agent
rules:
- apiGroups: [""]
resources:
- configmaps
resourceNames:
- kubeadm-config
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-agent
namespace: kube-system
labels:
k8s-app: elastic-agent
---
we also tried to mount the CA as described in this link, but the elastic-agents pod are stuck in ContainerCreating and kubectl get events -n kube-system
are showing the message: cannot mount volumes...
:
// ...
volumeMounts:
...
- name: elastic-ca
mountPath: /etc/ssl/certs/elastic-ca.crt
subPath: elastic-ca.crt
readOnly: true
volumes:
...
- name: elastic-ca
secret:
secretName: elastic-ca
we are lost, If someone can help would be awesome!