Kubernetes autodiscover sending logs from only some of the identical nodes in one of our clusters

Hi,

We're running filebeat as a deamonset on all nodes of our on-premises kubernetes clusters.
(Filebeat sends logs to logstash, logstash does some processesing and sends them on to elasticsearch.)

On one of our clusters this is working fine. On the other one we have this issue:
Only four of the filebeats pick up logs and send them on. If I delete all filebeat pods, the four nodes whose logs are send are often four different ones, but never more than four.

I'm running filebeat in debug, which shows - for the working ones- logs like this:

Summary
2018-12-31T12:49:43.979Z	DEBUG	[publish]	pipeline/processor.go:308	Publish event: {
"@timestamp": "2018-12-31T12:49:41.882Z",
"@metadata": {
  "beat": "filebeat",
  "type": "doc",
  "version": "6.5.4"
},
"offset": 19062135,
"stream": "stdout",
"prospector": {
  "type": "docker"
},
"input": {
  "type": "docker"
},
"message": "2018-12-31 12:49:41.875 [debug] <0.24878.685> Supervisor {<0.24878.685>,rabbit_connection_sup} started rabbit_connection_helper_sup:start_link() at pid <0.24831.685>", "source": "/var/lib/docker/containers/b03f9c1b1b6f9c7811a680b7149ecb3397b9f1d03380a305d3bda46baf0e2712/b03f9c1b1b6f9c7811a680b7149ecb3397b9f1d03380a305d3bda46baf0e2712-json.log",
"kubernetes": {
  "node": {
    "name": "lcl-worker-005"
  },
  "container": {
    "name": "rabbitmq"
  },
  "namespace": "development-1",
  "statefulset": {
    "name": "rabbitmq"
  },
  "labels": {
    "controller-revision-hash": "rabbitmq-7d88bd67f",
    "statefulset": {
      "kubernetes": {
        "io/pod-name": "rabbitmq-0"
      }
    },
    "component": "rabbitmq"
  },
  "pod": {
    "name": "rabbitmq-0"
  }
},
"beat": {
  "name": "filebeat-t8nfr",
  "hostname": "filebeat-t8nfr",
  "version": "6.5.4"
},
"host": {
  "name": "filebeat-t8nfr"
}

}

And metrics:

Summary
{
    "monitoring": {
        "metrics": {
            "beat": {
                "cpu": {
                    "system": {
                        "ticks": 31630,
                        "time": {
                            "ms": 203
                        }
                    },
                    "total": {
                        "ticks": 249710,
                        "time": {
                            "ms": 1513
                        },
                        "value": 249710
                    },
                    "user": {
                        "ticks": 218080,
                        "time": {
                            "ms": 1310
                        }
                    }
                },
                "handles": {
                    "limit": {
                        "hard": 1048576,
                        "soft": 1048576
                    },
                    "open": 35
                },
                "info": {
                    "ephemeral_id": "935d404b-d5b9-4458-a8c9-80818c2b7107",
                    "uptime": {
                        "ms": 5132179
                    }
                },
                "memstats": {
                    "gc_next": 18202432,
                    "memory_alloc": 11776776,
                    "memory_total": 16818113864,
                    "rss": -491520
                }
            },
            "filebeat": {
                "events": {
                    "active": -184,
                    "added": 2953,
                    "done": 3137
                },
                "harvester": {
                    "open_files": 27,
                    "running": 27
                }
            },
            "libbeat": {
                "config": {
                    "module": {
                        "running": 0
                    }
                },
                "output": {
                    "events": {
                        "acked": 2970,
                        "active": -49,
                        "batches": 24,
                        "total": 2921
                    },
                    "read": {
                        "bytes": 144
                    },
                    "write": {
                        "bytes": 243393
                    }
                },
                "pipeline": {
                    "clients": 59,
                    "events": {
                        "active": 61,
                        "published": 2892,
                        "total": 2892
                    },
                    "queue": {
                        "acked": 2970
                    }
                }
            },
            "registrar": {
                "states": {
                    "current": 129,
                    "update": 3027
                },
                "writes": {
                    "success": 26,
                    "total": 25
                }
            },
            "system": {
                "load": {
                    "1": 5.43,
                    "15": 13.28,
                    "5": 10.84,
                    "norm": {
                        "1": 1.3575,
                        "15": 3.32,
                        "5": 2.71
                    }
                }
            }
        }
    }
}

continues in reply

The ones not sending logs dont seem to log any debug statements (after booting that is), and log following metrics (emphasis mine):

Summary
    "monitoring": {
        "metrics": {
            "beat": {
                "cpu": {
                    "system": {
                        "ticks": 330,
                        "time": {
                            "ms": 2
                        }
                    },
                    "total": {
                        "ticks": 1070,
                        "time": {
                            "ms": 4
                        },
                        "value": 1070
                    },
                    "user": {
                        "ticks": 740,
                        "time": {
                            "ms": 2
                        }
                    }
                },
                "handles": {
                    "limit": {
                        "hard": 1048576,
                        "soft": 1048576
                    },
                    "open": 5
                },
                "info": {
                    "ephemeral_id": "bc5d10c5-988a-41a7-a1e2-cfeedc93d142",
                    "uptime": {
                        "ms": 5580466
                    }
                },
                "memstats": {
                    "gc_next": 6336000,
                    "memory_alloc": 3965560,
                    "memory_total": 64719856
                }
            },"filebeat": {
                "harvester": {
                    "open_files": 0,
                    "running": 0
                }
            },
            "libbeat": {
                "config": {
                    "module": {
                        "running": 0
                    }
                },
                "pipeline": {
                    "clients": 0,
                    "events": {
                        "active": 0
                    }
                }
            },
            "registrar": {
                "states": {
                    "current": 0
                }
            },
            "system": {
                "load": {
                    "1": 10.59,
                    "15": 11.61,
                    "5": 12.68,
                    "norm": {
                        "1": 2.6475,
                        "15": 2.9025,
                        "5": 3.17
                    }
                }
            }
        }
    }
}

config of filebeat:

Summary
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: logging
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    tail_files: true
    max_backoff: 60s
    logging.level: debug
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          hints.enabled: true
    output.logstash:
      hosts: ['logstash:5044']
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-inputs
  namespace: logging
  labels:
    k8s-app: filebeat
data:
  kubernetes.yml: |-
    - type: docker
      containers.ids:
      - "*"
      processors:
        - add_kubernetes_metadata:
            in_cluster: true
        - drop_fields:
            when:
              has_fields: ['kubernetes.labels.app']
            fields:
              - 'kubernetes.labels.app'

daemonset:

Summary
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: logging
  labels:
    k8s-app: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
  template:
    metadata:
      labels:
        k8s-app: filebeat
      annotations:
        co.elastic.logs/disable: "true"
    spec:
      serviceAccountName: filebeat
      terminationGracePeriodSeconds: 30
      containers:
        - name: filebeat
          image: docker.elastic.co/beats/filebeat-oss:6.5.4
          args: [
            "-c", "/etc/filebeat.yml",
            "-e",
          ]
          securityContext:
            runAsUser: 0
          resources:
            requests:
              cpu: 50m
              memory: 50Mi
            limits:
              cpu: 50m
              memory: 100Mi
          volumeMounts:
            - name: config
              mountPath: /etc/filebeat.yml
              readOnly: true
              subPath: filebeat.yml
            - name: inputs
              mountPath: /usr/share/filebeat/inputs.d
              readOnly: true
            - name: data
              mountPath: /usr/share/filebeat/data
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      # # allow filebeat to also be scheduled on master nodes, so we can pick up kubernetes logs
      # tolerations:
      #   - key: node-role.kubernetes.io/master
      #     effect: NoSchedule
      volumes:
        - name: config
          configMap:
            defaultMode: 0600
            name: filebeat-config
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: inputs
          configMap:
            defaultMode: 0600
            name: filebeat-inputs
        # data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
        - hostPath:
            path: /var/lib/filebeat-data
            type: DirectoryOrCreate
          name: data

filebeat, logstash, elasticsearch: 6.5.4
kubernetes version: 1.9.5

I'm looking for help figuring out what's going on here. The differences with the clusters where everything seems to be working fine are a whole lot less load and a different linux kernel.

This line in the metrics of the non-working filebeat worries me:

"harvester": {
    "open_files": 0,
    "running": 0
  }

On nodes where filebeat logs this, there are still lot's of logfiles which havent been read yet

this is the startup log of one of the filebeats that fails to send on logs:

2018-12-31T11:24:09.586Z	INFO	instance/beat.go:592	Home path: [/usr/share/filebeat] Config path: [/usr/share/filebeat] Data path: [/usr/share/filebeat/data] Logs path: [/usr/share/filebeat/logs]
2018-12-31T11:24:09.614Z	DEBUG	[beat]	instance/beat.go:629	Beat metadata path: /usr/share/filebeat/data/meta.json
2018-12-31T11:24:09.614Z	INFO	instance/beat.go:599	Beat UUID: dd6254b9-9915-4b99-b145-897ce65217cf
2018-12-31T11:24:09.614Z	DEBUG	[seccomp]	seccomp/seccomp.go:109	Loading syscall filter	{"seccomp_filter": {"no_new_privs":true,"flag":"tsync","policy":{"default_action":"errno","syscalls":[{"names":["accept","accept4","access","arch_prctl","bind","b...
2018-12-31T11:24:09.618Z	INFO	[seccomp]	seccomp/seccomp.go:116	Syscall filter successfully installed
2018-12-31T11:24:09.618Z	INFO	[beat]	instance/beat.go:825	Beat info	{"system_info": {"beat": {"path": {"config": "/usr/share/filebeat", "data": "/usr/share/filebeat/data", "home": "/usr/share/filebeat", "logs": "/usr/share/filebeat/logs"}, "type"...
2018-12-31T11:24:09.618Z	INFO	[beat]	instance/beat.go:834	Build info	{"system_info": {"build": {"commit": "bd8922f1c7e93d12b07e0b3f7d349e17107f7826", "libbeat": "6.5.4", "time": "2018-12-17T20:23:13.000Z", "version": "6.5.4"}}}
2018-12-31T11:24:09.618Z	INFO	[beat]	instance/beat.go:837	Go runtime info	{"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":4,"version":"go1.10.6"}}}
2018-12-31T11:24:09.679Z	INFO	[beat]	instance/beat.go:841	Host info	{"system_info": {"host": {"architecture":"x86_64","boot_time":"2018-09-24T06:45:30Z","containerized":true,"name":"filebeat-6hrsr","ip":["127.0.0.1/8","::1/128","10.36.0.115/12","fe80::7427:...
2018-12-31T11:24:09.680Z	INFO	[beat]	instance/beat.go:870	Process info	{"system_info": {"process": {"capabilities": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","...
2018-12-31T11:24:09.680Z	INFO	instance/beat.go:278	Setup Beat: filebeat; Version: 6.5.4
2018-12-31T11:24:09.680Z	DEBUG	[beat]	instance/beat.go:299	Initializing output plugins
2018-12-31T11:24:09.680Z	DEBUG	[processors]	processors/processor.go:66	Processors: 
2018-12-31T11:24:09.697Z	DEBUG	[publish]	pipeline/consumer.go:137	start pipeline event consumer
2018-12-31T11:24:09.697Z	INFO	[publisher]	pipeline/module.go:110	Beat name: filebeat-6hrsr
2018-12-31T11:24:09.807Z	INFO	[monitoring]	log/log.go:117	Starting metrics logging every 30s
2018-12-31T11:24:09.807Z	INFO	instance/beat.go:400	filebeat start running.
2018-12-31T11:24:09.807Z	DEBUG	[registrar]	registrar/registrar.go:114	Registry file set to: /usr/share/filebeat/data/registry
2018-12-31T11:24:09.807Z	INFO	registrar/registrar.go:134	Loading registrar data from /usr/share/filebeat/data/registry
2018-12-31T11:24:09.809Z	INFO	registrar/registrar.go:141	States Loaded from registrar: 98
2018-12-31T11:24:09.809Z	WARN	beater/filebeat.go:374	Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2018-12-31T11:24:09.809Z	INFO	crawler/crawler.go:72	Loading Inputs: 0
2018-12-31T11:24:09.809Z	INFO	crawler/crawler.go:106	Loading and starting Inputs completed. Enabled inputs: 0
2018-12-31T11:24:09.809Z	WARN	[cfgwarn]	kubernetes/kubernetes.go:51	BETA: The kubernetes autodiscover is beta
2018-12-31T11:24:09.809Z	DEBUG	[registrar]	registrar/registrar.go:267	Starting Registrar
2018-12-31T11:24:09.809Z	INFO	kubernetes/util.go:86	kubernetes: Using pod name filebeat-6hrsr and namespace logging to discover kubernetes node
2018-12-31T11:24:09.944Z	INFO	kubernetes/util.go:93	kubernetes: Using node hal-worker-003 discovered by in cluster pod node query
2018-12-31T11:24:09.945Z	WARN	[cfgwarn]	hints/logs.go:56	BETA: The hints builder is beta
2018-12-31T11:24:09.945Z	DEBUG	[autodiscover]	autodiscover/autodiscover.go:82	Configured autodiscover provider: kubernetes
2018-12-31T11:24:09.945Z	INFO	autodiscover/autodiscover.go:103	Starting autodiscover manager
2018-12-31T11:24:09.945Z	INFO	kubernetes/watcher.go:180	kubernetes: Performing a resource sync for *v1.PodList
2018-12-31T11:24:10.486Z	DEBUG	[kubernetes]	kubernetes/kubernetes.go:107	Watcher Pod add: metadata:<name:"message-service-9df9fc8f5-dbx49" generateName:"message-service-9df9fc8f5-" namespace:"demo" selfLink:"/api/v1/namespaces/demo/pods/messa...
2018-12-31T11:24:10.486Z	DEBUG	[kubernetes]	kubernetes/kubernetes.go:265	Generated builder event map[kubernetes:{"annotations":{"co":{"elastic":{"logs/multiline":{"match":"after","negate":"true","pattern":"^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}...
2018-12-31T11:24:10.487Z	DEBUG	[hints.builder]	hints/logs.go:145	generated config &{ctx:{parent:<nil> field:} metadata:<nil> fields:0xc4201f92c0}
2018-12-31T11:24:10.487Z	DEBUG	[bus]	bus/bus.go:72	filebeat: map[host:10.36.0.131 port:8080 kubernetes:{"annotations":{"co":{"elastic":{"logs/multiline":{"match":"after","negate":"true","pattern":"^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:...
...

Still stuck on this issue... Checked if the maximum open files is reached, which is not the case

It seems like Filebeat cannot find the log files, thus it does not start any harvester to read. Are you sure that the files you would like to read are present? All folders are mounted correctly?
Have you looked at this option? https://www.elastic.co/guide/en/beats/filebeat/6.5/filebeat-input-log.html#filebeat-input-log-scan-frequency
Is it possible that input files are rotated or removed in some fashion so Filebeat misses those? If yes, you could try to decrease scan_frequency, quick file changes does not interfere with log collection.

1 Like

Hi, thanks for the reply.

Are you sure that the files you would like to read are present? All folders are mounted correctly?
Yes,
If I exec into one of the problem filebeats, ls /var/lib/docker/containers/ shows all containers on this host.
The problem filebeats didnt pick up any of their logs while the non-problem filebeats do pick up theirs.
I'm not really sure how the kubernetes autodiscover provider picks up on these log files.

Have you looked at this option? https://www.elastic.co/guide/en/beats/filebeat/6.5/filebeat-input-log.html#filebeat-input-log-scan-frequency
I looked at it, but it is still on the default. I'll try fiddeling with it, but dont really see how this'll help.

Is it possible that input files are rotated or removed in some fashion so Filebeat misses those? If yes, you could try to decrease scan_frequency , quick file changes does not interfere with log collection.

These are our docker log-opts:
--log-opt max-size=50m --log-opt max-file=5
Rotation happens rather slowly.

What is the behaviour of filebeat If it tries to read the log files, but the fs is really slow?
Does it print errors on timeout?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.