Filebeat and Kubernetes: excluding log files

Hi there,

I'm having trouble configuring filebeat on Kubernetes.
Let's say you want filebeat to get the containers logs from Kubernetes, but you would like to exclude some files (for example because you don't want to get logs from filebeat, which is also running as a pod on Kubernetes).

I thought this prospector config would be right, but no luck so far:

- type: docker
  containers:
    ids:
      - "*"
    path: "/var/log/containers"
  exclude_files:
    - "/var/log/containers/filebeat*.log"
    - "/var/log/containers/logstash*.log"
  processors:
    - add_kubernetes_metadata:
        in_cluster: true
        default_matchers.enabled: false
        matchers:
        - logs_path:
            logs_path: /var/log/containers/

Am I doing something wrong, or is it just not possible at the moment ?
This work seems to been made possible thanks to this PR : https://github.com/elastic/beats/pull/4981 .
Using filebeat:6.1.3, btw.

Many thanks,
Jeremie

Hi @jeremievallee,

exclude_files parameter expects a list of regular expressions, you wrote something in glob format, try setting something like:

- '/var/log/containers/filebeat.*'
- '/var/log/containers/logstash.*'

Also, if you don't mind reading those files and then discarding the output, you could use drop_event processor: https://www.elastic.co/guide/en/beats/filebeat/current/drop-event.html

Hi @exekias, thanks for the response. I tried with the single quotes, but still not working unfortunately. But actually, even trying without any exclude_files key also does not work:

- type: docker
  containers:
    ids:
      - '*'
    path: "/var/log/containers"
  processors:
    - add_kubernetes_metadata:
        in_cluster: true
        default_matchers.enabled: false
        matchers:
        - logs_path:
            logs_path: /var/log/containers/

So, perhaps even this config ^ is false ? I know that by default filebeat goes in /var/lib/docker/containers to get the logs. However since the logs in that folder are referenced via an id rather than a name, it's impossible for me to exclude the ones I want, as I can't know the ids.

My understanding from that merged PR (https://github.com/elastic/beats/pull/4981) was that it was possible to configure Filebeat to get the logs from the /var/log/containers directory instead, which does contain the name of the applications. Is that true ? Do you have an example for this use case ?

Surely I can't be the only one needing to remove some of these logs (I hope :smiley: )

Thanks,
Jeremie

Btw I did try with the drop_event processor, and it works, however it does become a heavy query on the logs when the list of containers to blacklist grows, which seems very inefficient to me.

I think that config should work, could you explain what's the current behavior?

Also for your use case, we have been working on Kubernetes autodiscover, but it hasn't been released yet (will be available with 6.2): https://github.com/elastic/beats/pull/6055

The current behaviour is that filebeat starts on every node but does not pick up any file at all. I did mount the /var/log/containers directory to Filebeat, and I checked it could access the files inside that folder.

I will investigate more.

In most cases /var/log/containers contains symlinks to/var/lib/docker/containers/, is this the case? You would need to mount both?

Yes I am mounting both. Actually, the files in /var/log/containers contain symlinks to files in /var/log/pods which contain symlinks to /var/lib/docker/containers.
I started by mounting these 3 folders, but didn't help.
I'm now testing to mount /var/log entirely, see if that would work.

No luck, unfortunately.
Also tried with using the log type instead of docker

    - type: log
      paths:
        - /var/log/containers/*.log
      json.message_key: log
      json.keys_under_root: true
      processors:
        - add_kubernetes_metadata:
            in_cluster: true
            default_matchers.enabled: false
            matchers:
            - logs_path:
                logs_path: /var/log/containers/

But that didn't help, filebeat doesn't pick anything. Perhaps @Sven_Woltmann would know ?

Worse case scenario, I know that reading the files and using drop_event works with the following config:

    - type: docker
      containers.ids:
      - "*"
      processors:
        - add_kubernetes_metadata:
            in_cluster: true
        - drop_event:
            when:
              equals:
                kubernetes.container.name: "filebeat"
        - drop_event:
            when:
              equals:
                kubernetes.container.name: "logstash"

It's just that I don't think this is really efficient and scalable. But perhaps I'm wrong ?

Many thanks,
Jeremie

1 Like

Hi @jeremievallee,

it seems that the format of your configuration file is wrong, e.g. processors should not be on the same level as type and path, rather on the same level as filebeat.prospectors (type's and path's parent).

Here's my config file running on our production cluster. However, I have not yet updated to 6.1, still using 6.0 beta plus my changes. So my config might not be fully up to date.

filebeat.prospectors:
  - type: log
	paths:
	  - "/var/log/containers/*.log"

	# Don't read my own logs, and some others:
	exclude_files:
	  - filebeat-.*\.log
	  - default-http-backend-.*\.log
	  - nginx-ingress-controller-.*\.log

	# Keys are copied top level in the output document:
	json.keys_under_root: true

	# Filebeat adds a "error.message" and "error.type: json" key in case of JSON unmarshalling errors:
	json.add_error_key: true

	# Allow Filebeat to harvest symlinks in addition to regular files:
	symlinks: true

filebeat.shutdown_timeout: 5s

filebeat.registry_file: /var/log/containers/filebeat_registry

name: ${NODE_NAME}

processors:
  # In logs from our microservices, "log" contains a JSON object.
  # In logs from Kubernetes services, "log" contains the log message.
  # Fortunately, Filebeat detects the difference and decodes "log" only when it contains an escaped JSON String.
  - decode_json_fields:
	  fields: ["log"]
	  # Merge the decoded JSON fields into the root of the event:
	  target: ""
  - add_kubernetes_metadata:
	  in_cluster: true

output.elasticsearch:
  hosts:
	- xxxxxxxx
	- xxxxxxxx
	- xxxxxxxx
  username: xxxxxxxx
  password: xxxxxxxx
  index: "filebeat-%{[beat.version]}-kube-prod-%{+yyyy.MM.dd}"
  bulk_max_size: 2500

# These are required since 6.0.0-beta2 if output.elasticsearch.index is defined
setup.template.name: "filebeat-%{[beat.version]}"
setup.template.pattern: "filebeat-%{[beat.version]}-*"

Hope that helps.

Sven

I forgot to mention, I'm mounting these three folders into the filebeat containers:

  • /var/log/containers (files here are symlinks to /var/log/pods/...)
  • /var/log/pods (files here are symlinks to /var/lib/docker/containers/...)
  • /var/lib/docker/containers

Hi @Sven_Woltmann! Thanks a lot for your messages, I finally got it working! :tada:

You're right, indentation was off, and also I didn't have the symlinks: true option on. Here's my final config, and I can confirm it works with filebeat:6.1.3:

filebeat.prospectors:
  - type: log
    paths:
      - "/var/log/containers/*.log"
    exclude_files:
      - filebeat-.*\.log
      - logstash-.*\.log
    json.message_key: log
    json.add_error_key: true
    json.keys_under_root: true
    symlinks: true
    tail_files: true

processors:
  - add_kubernetes_metadata:
      in_cluster: true
      default_matchers.enabled: false
      matchers:
      - logs_path:
          logs_path: /var/log/containers/

filebeat.shutdown_timeout: 5s

output.logstash:
  hosts: ["logstash:5044"]

Also worth mentioning that I'm mounting:

  • /var/lib/docker/containers
  • /var/log/pods
  • /var/log/containers

All three in readOnly mode.

Thanks a lot @exekias and @Sven_Woltmann for your help :slight_smile:

Jeremie

4 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.