Filebeat Kubernetes autodiscover with post "processor" specific field with another filebeat module

kgfathur · January 5, 2023, 3:29am

I have setup filbeat on Kubernetes (ECK) with sample and guide from docs:

Version:
ECK: 2.5.0
filebeat: 8.5.3

The filebeat deployment and configuration already running.
Here's some manifest snipet that used for deploy the filebeat:

$ cat filebeat-eck.autodiscover.yaml
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: elastic
spec:
  type: filebeat
  version: 8.5.3
  elasticsearchRef:
    name: elastic
  config:
    filebeat:
      autodiscover:
        providers:
        - node: ${NODE_NAME}
          type: kubernetes
          hints:
            enabled: true
            default_config:
              type: container
              paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log
  daemonSet:
    podTemplate:
      spec:
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        securityContext:
          runAsUser: 0
...

Kubernetes logs has been successfully parsed. However, In the image example, I have specified Apps/Pod that running NGINX and I want to process the "message" field with NGINX filebeat module. Since the "message" field is exactly match the filebeat module for NGINX.

I've read some possible feature realated to "processor" in the filebeat.

Filter and enhance data with processors| Filebeat Reference [8.5]

The document said that:

The libbeat library provides processors for:
- reducing the number of exported fields
- enhancing events with additional metadata-
- performing additional processing and decoding

So it can be used for performing additional processing and decoding. it's amazing feature.

event -> processor 1 -> event1 -> processor 2 -> event2 ...

However, from the docs I can assume that it can process event with pre-defined/supported "processor":

I see a there's "dissect" processor that can be used to add custom enrich for specific field (Like we can use with logstash). However, the custom filter/grok actually is not what's expected here, since the filebeat itself has many of built-in module (that include pipeline/filter), i.e: nginx.

Is that possible to do something like this?

[module] -> [ event ] -> [ processor ] ( to process specific field with antother filebeat module) -> [ event ] -> ...

I have considered for post-processing with Logstash. However, if that's possible to be done in filebeat using pre-define modulle/processor it would be better.

I've take a look some possible feature:

If there's more best practice to do this, or any help and suggestions would be greatly appreciated!

Thanks!

legoguy1000 · January 5, 2023, 1:23pm

Yes, u can use the hints to apply module specific paying to container logs. You can also specify in the auto discover section of the config as seen here Autodiscover | Filebeat Reference [8.5] | Elastic

kgfathur · January 6, 2023, 6:27am

Thanks Alex, for the suggestion.

I'm still reading the docs, and trying to understand it
After more than 10hrs, take a rest and read it again, just find that there's "hint" that we can use on the pod annotations.

Ayush_Mathur · January 6, 2023, 2:10pm

@kgfathur or you can configure filebeat input for nginx if you want to keep the autodiscovery simple

kgfathur · January 6, 2023, 3:34pm

Thanks all, for the recommendation and suggestion.

Finally the pod/container logs from specific log format has been perfectly parsed using Hints based autodiscover | Filebeat Reference [8.5] | Elastic.

In the first time read the docs, I don't get the point on how to put the "hint" that parsed the log from kubernetes pod/container with different module.
The key are "annotations" (for kubernetes) or "labels" (for container/docker/podman etc)
In my case, on kubernetes, every pod/container log that need to be processed by the filebeat module/pipeline, So I need add some annotations. For Deployment we can create sample manifest like this:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-deployment
spec:
  selector:
    matchLabels:
      app: webapp
  replicas: 2
  template:
    metadata:
      labels:
        app: webapp
      annotations:
        co.elastic.logs/module: nginx
        co.elastic.logs/fileset.stdout: access
        co.elastic.logs/fileset.stderr: error
    spec:
      containers:
      - name: webapp
        image: nginx:1.22.1
        ports:
        - containerPort: 80

Hope that help someone out there!

kgfathur · January 6, 2023, 3:44pm

Do you mean the filebeat nginx module directly read the access/error log file?

Or something like this? NGINX Ingress controller + Filebeat with NGINX module - Elastic Stack / Beats - Discuss the Elastic Stack

Ayush_Mathur · January 6, 2023, 3:48pm

My idea was to use the nginx module directly as shown here: Nginx module | Filebeat Reference [8.5] | Elastic

kgfathur · January 6, 2023, 4:03pm

Thanks for the Ideas!
However, I don't know if that will be easy to setup, since the logs come from containers that running on top kubernetes. So the log actually have some additional information and format.
The file also get merged between access.log (stdout) and error.log (stderr).

root@worker04:/var/log/pods# ls -lh
total 80K
...
drwxr-xr-x 3 root root 4.0K Jan  6 13:49 sample-webapp_webapp-deployment-f7c5c5596-jzhvr_3c360017-b15c-46d7-8da0-dd842f0bbc8e
root@worker04:/var/log/pods# ls -lh sample-webapp_webapp-deployment-f7c5c5596-jzhvr_3c360017-b15c-46d7-8da0-dd842f0bbc8e/
total 4.0K
drwxr-xr-x 2 root root 4.0K Jan  6 13:49 webapp
root@worker04:/var/log/pods# ls -lh sample-webapp_webapp-deployment-f7c5c5596-jzhvr_3c360017-b15c-46d7-8da0-dd842f0bbc8e/webapp/
total 4.1M
-rw------- 1 root root 4.1M Jan  6 15:54 0.log
root@worker04:/var/log/pods#

there's just 1 log for the standard nginx container/pod. The access and error log has been placed into 1 file.

The log also has different format, it's follow standard kubernetes log prefix, I think.

root@worker04:/var/log/pods# head -n5 sample-webapp_webapp-deployment-f7c5c5596-jzhvr_3c360017-b15c-46d7-8da0-dd842f0bbc8e/webapp/0.log
2023-01-06T13:49:43.865470598+00:00 stdout F /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
2023-01-06T13:49:43.865470598+00:00 stdout F /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
2023-01-06T13:49:43.866624107+00:00 stdout F /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
2023-01-06T13:49:43.871137409+00:00 stdout F 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
2023-01-06T13:49:43.877291349+00:00 stdout F 10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
root@worker04:/var/log/pods#
root@worker04:/var/log/pods# tail -n5 sample-webapp_webapp-deployment-f7c5c5596-jzhvr_3c360017-b15c-46d7-8da0-dd842f0bbc8e/webapp/0.log
2023-01-06T15:59:56.943756171+00:00 stdout F webapp.apps.k8s.domain.lab 10.16.0.147 - - [06/Jan/2023:15:59:56 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.61.1" "10.8.60.9"
2023-01-06T15:59:57.065568724+00:00 stdout F webapp.apps.k8s.domain.lab 10.16.0.147 - - [06/Jan/2023:15:59:57 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.61.1" "10.8.60.9"
2023-01-06T15:59:57.187069588+00:00 stdout F webapp.apps.k8s.domain.lab 10.16.0.147 - - [06/Jan/2023:15:59:57 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.61.1" "10.8.60.9"
2023-01-06T15:59:57.551829353+00:00 stdout F webapp.apps.k8s.domain.lab 10.16.0.147 - - [06/Jan/2023:15:59:57 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.61.1" "10.8.60.9"
2023-01-06T15:59:57.674128175+00:00 stdout F webapp.apps.k8s.domain.lab 10.16.0.147 - - [06/Jan/2023:15:59:57 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.61.1" "10.8.60.9"
root@worker04:/var/log/pods#

So to keep everything simple, and standard, I prefer to use Autodiscover hint.
Because I just need to setup general autodiscover for kubernetes/containers on filebeat side. Then setup annotation for every pod that I want to process the log, with more flexible module and it's self services.

system · February 3, 2023, 6:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to configure processor in autodiscover mode? Beats docker , filebeat	1	625	November 11, 2021
Filebeat as Daemonset for all Kubernetes Logs (including nginx) Beats filebeat	4	4116	March 2, 2019
Issue with Filebeat Kubernetes Configuration for Ingress Logs Beats docker , filebeat	2	255	October 9, 2023
Filebeat - Kubernetes Autodiscovery with ElasticSearch Module Beats docker , filebeat	1	383	May 5, 2022
Problem to send data from filebeat autodiscover container to elastic search on kubernetes Beats filebeat	2	851	September 10, 2020

Filebeat Kubernetes autodiscover with post "processor" specific field with another filebeat module

Related topics