Hi,
2 days ago, we began to regularly receive the "KubeAPIErrorsHigh" alert on our Kubernetes cluster (definition of the alert here).
To try to find the cause, we looked at the API server logs and saw the following error messages on kube-apiserver
pods :
$ stern kube-apiserver -n kube-system | grep -v "TLS handshake error" | grep -i error
kube-apiserver-ip-10-XX-XX-XX.eu-central-1.compute.internal kube-apiserver I0820 17:13:14.022460 1 log.go:172] http2: server: error reading preface from client 10.XX-XX-XX:55385: read tcp 10.XX-XX-XX:6443->10.XX-XX-XX:55385: read: connection reset by peer
kube-apiserver-ip-10-XX-XX-XX.eu-central-1.compute.internal kube-apiserver E0820 17:14:23.539314 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
kube-apiserver-ip-10-XX-XX-XX.eu-central-1.compute.internal kube-apiserver E0820 17:14:23.895988 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
After looking at our components one by one to find the culprit, we found that it was our filebeat 7.8.1 daemonset that was reponsible of these errors.
And then, after many config tests on filebeat config, we concluded that the errors disappeared if we remove the add_kubernetes_metadata:
field from the filebeat.yml
config.
And our API server grafana dashboard clearly showed the disappearance of the errors after we removed this option :
Here is the config file we used in our chart deployment :
---
filebeatConfig:
filebeat.yml: |
logging.metrics.enabled: false
filebeat.autodiscover:
providers:
- type: kubernetes
templates:
## Metadata namespace
- condition.contains:
kubernetes.namespace: metadata
config:
- type: container
paths:
- /var/lib/docker/containers/${data.kubernetes.container.id}/*.log
exclude_lines: ["^\\s+[\\-`('.|_]"] # drop asciiart lines
processors:
# Add config here to drop events
- add_cloud_metadata:
# ******************************** THIS LINE CAUSE API SERVER ERROR
- add_kubernetes_metadata:
# Drop all events
#- drop_event:
# when.has_fields: ["kubernetes.namespace"]
# Drop filebeat events
- drop_event:
when.equals:
kubernetes.container.name: "filebeat"
output.logstash:
hosts: ["logstash.url"]
Environment :
- K8S version = 1.17.6
- Filebeat versions = 7.8.1 and 7.9 (both tested)
- Filebeat installation :
elastic/filebeat
official char from https://helm.elastic.co
Additionnal info : we did not have the problem with our previous 7.5.0 filebeat version. The errors appeared after the 7.5.0 => 7.8.1 upgrade.
I was going to open an issue but the message in github mentionned that I had to post a question here before, so here it is
Do I have to open an issue please ?