Filebeat Kubernetes Pod Publishing Behind For Only Some Files in Container Path Input

Greetings!

I have deployed the Filebeat Daemonset to collect container logs.

Kubernetes YAML:

apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: filebeat
data:
  filebeat.yml: |-
    filebeat.config:
      inputs:
        # Mounted `filebeat-inputs` configmap:
        path: ${path.config}/inputs.d/*.yml
        # Reload inputs configs as they change:
        reload.enabled: false
      modules:
        path: ${path.config}/modules.d/*.yml
        # Reload module configs as they change:
        reload.enabled: false

    output.logstash:
      hosts: ${LOGSTASH_HOSTS:?No logstash host configured. Use env var LOGSTASH_HOSTS to set hosts.}
      timeout: 15

    max_procs: 1
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-inputs
  namespace: filebeat
  labels:
    k8s-app: filebeat
data:
  kubernetes.yml: |-
    - type: docker
      containers:
        ids:
          - "*"
      fields:
        type: @typeField@
        cluster_name: ${CLUSTER_NAME}
      processors:
        - add_kubernetes_metadata:
            in_cluster: true
            labels.dedot: true
            annotations.dedot: true
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: filebeat
  labels:
    k8s-app: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
  template:
    metadata:
      labels:
        k8s-app: filebeat
      annotations:
        git_commit_id: "@gitCommit@"
    spec:
      serviceAccountName: filebeat
      terminationGracePeriodSeconds: 30
      containers:
      - name: filebeat
        image: docker.elastic.co/beats/filebeat:6.7.1
        args: [
          "-c", "/etc/filebeat.yml",
          "-e",
        ]
        env:
        - name: CLUSTER_NAME
          value: @env@
        - name: LOGSTASH_HOSTS
          value: logstash.mydomain.com:5044
        securityContext:
          runAsUser: 0
          # If using Red Hat OpenShift uncomment this:
          #privileged: true
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: config
          mountPath: /etc/filebeat.yml
          readOnly: true
          subPath: filebeat.yml
        - name: inputs
          mountPath: /usr/share/filebeat/inputs.d
          readOnly: true
        - name: data
          mountPath: /usr/share/filebeat/data
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: config
        configMap:
          defaultMode: 0600
          name: filebeat-config
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: inputs
        configMap:
          defaultMode: 0600
          name: filebeat-inputs
      # data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
      - name: data
        hostPath:
          path: /var/lib/filebeat-data
          type: DirectoryOrCreate
      tolerations:
      - key: "node-role.kubernetes.io/etcd"
        operator: "Exists"
        effect: "NoExecute"
      - key: "node-role.kubernetes.io/controlplane"
        operator: "Exists"
        effect: "NoSchedule"
      - key: node_type
        operator: Equal
        value: large
        effect: NoSchedule
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: filebeat
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
  labels:
    k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  verbs:
  - get
  - watch
  - list
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: filebeat
  labels:
    k8s-app: filebeat

logstash.mydomain.com:5044 is a AWS NLB with 2 Logstash hosts behind it.

Kubernetes hosts running the pod: m5.2xlarge
m5.2xlarge = 8 vCPU, 32 GiB memory, 4,750 Mbps on EBS

The log files/var/lib/docker/containers/* that Filebeat is harvesting are a diverse set of sizes and all have different growth rates. When I am refreshing Kibana query I am seeing that log files growing by about 200KB/s are behind in Kibana. I updated the Filebeat args to also have -d * so that I could see if there were any errors. For the container that is writing about 200KB/s I got the container ID and then tailed the Filebeat logs with a grep for othe container ID. The latest "published" event that I see in the FIlebeat logs is what I see in Kibana. So it almost seems like harvester is not able to keep up. In Kibana I am querying for the specific Filebeat pod and I do see events for the other smaller log files coming in almost real time. So it doesn't seem like the pod itself is having memory or CPU issues. Also when I look at CPU and memory usage of the pod it's well below the container resource requests.

Not sure of why some files from the same Filebeat pod are coming in almost real time while others are not.

Some resources that I have been reviewing:

Please help :slight_smile: