Hi,
Thought I should check in with the community here before creating a github issue, just in case there's something I'm not understanding correctly;
After upgrading our filebeat kubernetes daemonset from 7.17.8
to 8.6.0
I can't seem to get rid of errors of this type:
log.level: error
log.logger: autodiscover
log.origin.file.line: 109
log.origin.file.name: cfgfile/list.go
message:
Error creating runner from config: failed to create input: Can only start an input when all related states are finished: {Id: ea745ab688be85a9-native::3884203-2049, Finished: false, Fileinfo: &{frontend-86c8579b5b-mhnpg_helpdesk-frontend_frontend-mgmt-1cc73434a92abe9b93d9a3d971cfc4182e8ce64ac0e03f0c6e395875236fd514.log 374 416 {204820038 63804978609 0x56347552d700} {2049 3884203 1 33184 0 0 0 0 374 4096 8 {1669381808 728813408} {1669381809 204820038} {1669381809 204820038} [0 0 0]}}, Source: /var/log/containers/frontend-86c8579b5b-mhnpg_helpdesk-frontend_frontend-mgmt-1cc73434a92abe9b93d9a3d971cfc4182e8ce64ac0e03f0c6e395875236fd514.log, Offset: 0, Timestamp: 2023-01-19 13:38:27.166489276 +0000 UTC m=+58865.698641043, TTL: -1ns, Type: container, Meta: map[stream:stdout], FileStateOS: 3884203-2049}
The number of errors varies depending of the number of pods deployed. In our current prod cluster I'm observing roughly 60k messages pr. 24h.
Filebeat is currently deployed as a daemonset using the official helm chart version 8.5.1
and running in Azure AKS, kubernetes version 1.24.6
.
This is the relevant part our current filebeat configuration (I've excluded output.*
and setup.*
):
logging:
level: warning
metrics.enabled: false
json: true
processors:
- # disable logs from select sources
drop_event.when.or:
- equals.kubernetes.labels.app: "secrets-store-csi-driver"
- equals.kubernetes.labels.app: "secrets-store-provider-azure"
- equals.kubernetes.labels.app: "konnectivity-agent"
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
cleanup_timeout: 2m
hints.enabled: true
hints.default_config:
type: container
paths:
- /var/log/containers/*-${data.kubernetes.container.id}.log
templates:
- # nginx logs: configure the filebeat nginx module
condition.equals:
# This pod annotation must be set on the app during deployment for this template to be applied
# See available fields for matching here:
# https://www.elastic.co/guide/en/beats/filebeat/current/configuration-autodiscover.html#_kubernetes
kubernetes.annotations.no.dsb-norge.filebeat/autodiscover-template: nginx
config:
- module: nginx
ingress_controller:
enabled: false
access:
enabled: true
input:
type: container
stream: stdout
paths:
- /var/log/containers/*-${data.kubernetes.container.id}.log
error:
enabled: true
input:
type: container
stream: stderr
paths:
- /var/log/containers/*-${data.kubernetes.container.id}.log
I'm able to avoid the error by either removing the templates:
-config or by disabling the hints.default_config
. Either of these are not suitable solutions for us as they result in missing logs or logs not being parsed correctly.
The error messages all refer to log files from our nginx pods. We have multiple other types of deployments without issues indicated by filebeat. Since we are using the nginx module conditionally for these pods this leads me to think there's som kind of race condition happening when the nginx module is applied with templates
-config in combination with default hints configuration.
We were able to achieve autodiscover with hints (including default_config
) and templates using filebeat 7.17.8
just fine without errors. Running on the same kubernetes version and deployed with official helm chart version 7.17.3
.
First I thought maybe I was experiencing github Issue #11834: [autodiscover] Error creating runner from config: Can only start an input when all related states are finished. But after reading I saw that this was fixed in filebeat 7.9.0
. Reading a bit further I saw that there were a couple of more issues resulting in this error but those had also been fixed:
- github Issue #20850: Improve logging on autodiscover recoverable errors
- github Issue #20568: Improve logging when autodiscover configs fail
I have verified that we are not missing log entries and therefore I'm suspecting that my issue is also a "recoverable error", and that it should possibly not be logged on error level
Anyways, fingers crossed that any of you have experienced something similar or that you can spot an issue in our configuraiton