Filebeat 6.1 in Kubernetes - unable to fetch logs from new pods


#1
  • Filebeat version: 6.1.2
  • Kubernetes nodes are running on Google's GKE
  • Kubernetes version: 1.8.5-gke.0

Context

The difference is just the output, instead of ElasticSearch, we are using Kafka, creating a new topic for each app label on Kubernetes, the diff is:

-    output.kafka: 
-      hosts: ["brokers.kafka.svc.cluster.local:9092"]
-      topic: '%{[kubernetes.labels.app]}'
+    processors:
+      - add_cloud_metadata:
+
+    cloud.id: ${ELASTIC_CLOUD_ID}
+    cloud.auth: ${ELASTIC_CLOUD_AUTH}
+
+    output.elasticsearch:
+      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
+      username: ${ELASTICSEARCH_USERNAME}
+      password: ${ELASTICSEARCH_PASSWORD}

The Problem
New pods' logs are not being picked up by Filebeat. They are only picked up if I delete the Filebeat pods, once they get recreated by the DaemonSet conditions, new logs are picked up.

As an example, if we create a sample Deployment in Kubernetes with 10 replica pods, writing messages to stdout:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: log-test
  labels:
    app: my-custom-log

We should be seeing a new topic in Kafka by the name of log-test, however the logs for those new pods are not picked up, so no topic is created:

$ kubectl exec kclient -n kafka -- /usr/bin/kafka-topics --zookeeper zookeeper:2181 --list | grep my-custom-log
$ 

However, once I shutdown one of the filebeat nodes:

$ kubectl delete pod -n kube-system filebeat-sf9kz
pod "filebeat-sf9kz" deleted

It gets recreated:

filebeat-sf9kz   1/1       Terminating   0         4m
filebeat-sf9kz   0/1       Terminating   0         4m
filebeat-twlgc   0/1       Pending   0         0s
filebeat-twlgc   0/1       ContainerCreating   0         0s
filebeat-twlgc   1/1       Running   0         1s

And now the topic is there as expected:

$ kubectl exec kclient -n kafka -- /usr/bin/kafka-topics --zookeeper zookeeper:2181 --list | grep my-custom-log
my-custom-log

Is there anything we can do to fix this situation?

Thanks!


(Carlos Pérez Aradros) #2

Hi @thesilence, could you share the output of any of the not-working Filebeat containers? I'm also interested in the one that works after recreation


#3

Hi @exekias thanks a lot for the blazing fast reply!

This is the log of one of the filebeat containers that is not working after creating new pods: https://pastebin.com/Dqvg08Fx

And this is the log of the new one that got created after I deleted the first one: https://pastebin.com/1uQeD6zG

Let me know if you would need further information, I'll be happy to share it

Many thanks!!


(Carlos Pérez Aradros) #4

Hi @thesilence,

It's strange, I don't see any error in the output, I'm wondering if the topic: '%{[kubernetes.labels.app]}' could be causing some issue, could you do a quick test after removing that? I'm trying to discard stalls caused by an empty value for kubernetes.labels.app

Best regards


#5

Hello @exekias,

thanks for your reply. I have made the following change:

topic: test01
#topic: '%{[kubernetes.labels.app]}'

and they do seem to get picked up, so this is almost probably an issue on the kafka output then.

All the logs are being sent now to test01 topic (including from new created pods that are constantly writing sample logs into stdout), see https://pastebin.com/HmsKvcQN

However, if you take a look at the metadata on those JSON logs, you can see that the label kubernetes.labels.app actually exists:

"labels":{"app":"deploy01","pod-template-hash":"4035586195"},

but it is not picked up in the topic: '%{[kubernetes.labels.app]}' configuration section.

Is there anything you can think of that we might be missing here?

Also remember that if I recreate the pod, the topic with the expected label deploy01 gets instantly created, so this seems to be a problem with the kafka output at runtime.

What do you think?

Many thanks again for the cooperation, I really appreciate the help!


(Carlos Pérez Aradros) #6

Hi, thank you for your feedback, we :elasticheart: detailed reports like this!

So this is what I think it's happening: For some log events, metadata is not in place. This can happen, especially when we read logs from old containers that are no longer running, we cannot retrieve info about them from Kubernetes, so they are sent unannotated.

I think you can detect that situation and set a default topic for those, while using the label for the rest, using topics setting, it allows you to define a set of rules:

    topic: 'default'
    topics:
      - topic: '%{[kubernetes.labels.app]}'
        when: 
          regexp:
            kubernetes.labels.app: '.*'

#7

@exekias no problem, always happy to provide as much detail as possible! :joy:

:elasticheart: x100 for your help, that config change just made the trick!!!

Now everything is working great, and on top of that, we will be able to identify non-app tagged pods in our deployments.

Many thanks again, it feels great to have this setup working like a charm now :wink:


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.