Filebeat occasionally loses logs: Dropping event: no topic could be selected (Kubernetes + Kafka)

Hi everyone,
I’ve been troubleshooting an issue for the last few days and would like to confirm whether this behavior is expected or if there is a workaround.


Problem

When Filebeat starts on a Kubernetes node (DaemonSet), sometimes it drops container logs produced in the first few seconds of a pod’s lifecycle. The logs never reach Kafka and cannot be recovered.

The only message in the Filebeat logs during the loss window is:

2025-11-04T11:48:19.764Z	ERROR	[kafka]	kafka/client.go:147	Dropping event: no topic could be selected
2025-11-04T11:48:19.764Z	ERROR	[kafka]	kafka/client.go:147	Dropping event: no topic could be selected
2025-11-04T11:48:19.764Z	ERROR	[kafka]	kafka/client.go:147	Dropping event: no topic could be selected
2025-11-04T11:48:19.764Z	ERROR	[kafka]	kafka/client.go:147	Dropping event: no topic could be selected
2025-11-04T11:48:19.764Z	ERROR	[kafka]	kafka/client.go:147	Dropping event: no topic could be selected

After a few seconds, Filebeat resumes sending logs normally. Only the early logs are gone.


Environment

Component Value
Filebeat version 7.10.2
Platform Kubernetes
Deployment DaemonSet
Log source /var/log/containers/*.log
Output Kafka

Current Config (simplified)

filebeat.inputs:
- type: container
  enabled: true
  paths:
    - '/var/log/containers/*.log'
  processors:
    - add_kubernetes_metadata:
        host: ${NODE_NAME}
        matchers:
        - logs_path:
            logs_path: "/var/log/containers/"
output.kafka:
  hosts: ["..."]
  username: "..."
  password: "..."
  topic: "%{[kubernetes.namespace]}"


This looks like a race condition: Filebeat starts harvesting logs before Kubernetes metadata is resolved, causing topic interpolation to fail → message dropped.


Questions

  1. Is this an expected behavior today?

  2. Any recommended config pattern to guarantee zero log loss when using metadata-based topic routing? not like topic: "filebeat-%{[kubernetes.namespace]:pending}"


Happy to provide more info if helpful

I can test patches, try alternative configs, or collect extra debug logs if needed.

Thanks a lot!

Hi @ruizengyang Welcome to the community.

Version 7.10 is incredibly old.. 6+ years.

There has been a HUGE amount of work on Filebeat and Kubernetes / container logs etc since then.

Your first step should be updating your stack to a new version with a matter of urgency.

Got it! Didn’t realize 7.10 was that old :sweat_smile:
I’ll upgrade to the newest version and see if the log-loss still happens.

Thanks a lot for the reply!

1 Like

I’ve upgraded my stack to Filebeat 9.2.1 using the Docker image:

docker.elastic.co/beats/filebeat-wolfi:9.2.1

However, I’m still seeing the same error:

{"log.level":"error","@timestamp":"2025-12-01T13:49:02.555Z","log.logger":"kafka","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/kafka.(*client).Publish","file.name":"kafka/client.go","file.line":188},"message":"Dropping event: no topic could be selected","service.name":"filebeat","ecs.version":"1.6.0"}

I also noticed that this error occurs roughly every 20 seconds.

I’m not sure what else I can try. Any guidance or suggestions would be greatly appreciated!

Thanks in advance.

Hi @ruizengyang

A couple of things you can try to do:

You could try simplifying this without the matcher jsut to see

  processors:
    - add_kubernetes_metadata:

You can turn on filebeat debug to see if there is additional information (is it a specific pod?)

You can add a condition in the topics to see if kubernetes.namespace exists and if it does not ship it to a static topic that way you can see which pod / more info etc.

See here

Something like

output.kafka:
  hosts: ["..."]
  username: "..."
  password: "..."
  topic: "no-topic"
  topics:
    - topic: "%{[kubernetes.namespace]}"
      when.has_fields: ['kubernetes.namespace']

That is interesting... what is going on every 20 seconds?

With respect to the race condition if that is it... not sure exactly how to solve that... I will poke around but there would seem to be many variables.

Also could you please provide your latest filebeat.yml

Thanks for the continued follow-up — I did more testing and collected additional details.


logging.level: debug
logging.selectors: "*"
fields:
  "cluster_id": "test"
  "cluster": "test"
filebeat.inputs:
- type: filestream
  id: k8s-app
  close.on_state_change.inactive: 1m
  close.on_state_change.removed: true
  ignore_inactive: since_last_start
  clean_inactive: 2h
  ignore_older: 70m
  paths:
    - '/var/log/containers/*.log'
  parsers:
    - container: ~
    - multiline:
        type: pattern
        pattern: '^\s'
        negate: false
        match: after
        max_lines: 500
        timeout: 1s
  prospector:
    scanner:
      fingerprint.enabled: true
      symlinks: true
      exclude_files: ['filebeat-.*\.log']
  file_identity.fingerprint: ~

  processors:
    - add_kubernetes_metadata:
        #host: ${NODE_NAME}
        #matchers:
        #  - logs_path:
        #      logs_path: "/var/log/containers/"

processors:
  - drop_event:
      when:
        or:
          - and:
              - equals:
                  kubernetes.namespace: "kube-system"
              - not:
                  equals:
                    kubernetes.container.name: "controller"
          - equals: { kubernetes.namespace: "default" }
          - equals: { kubernetes.namespace: "kube-vm" }
          - equals: { kubernetes.namespace: "gitlab-runner" }
          - equals: { kubernetes.namespace: "kube-node-lease" }
          - equals: { kubernetes.namespace: "kube-flannel" }
          - equals: { kubernetes.namespace: "argocd" }
          - equals: { kubernetes.container.name: "logstash" }
          - equals: { kubernetes.container.name: "filebeat" }

output.kafka:
  enable: true
  hosts: ["ip1:9092","ip2:9092","ip3:9092"]
  topic: "no-topic"
  topics:
    - topic: "%{[kubernetes.namespace]}"
      when.has_fields: ['kubernetes.namespace']
  key: '%{[kubernetes.pod.uid]}'
  required_acks: 1
  worker: 10
  compression: gzip
  max_message_bytes: 10000000

If I remove the matcher from add_kubernetes_metadata, then all logs are sent to the fallback topic (no-topic), and I noticed none of the events contain Kubernetes metadata fields (no kubernetes.*).
At the same time I see this debug log continuously:

{"log.level":"debug","log.logger":"kubernetes","message":"log.file.path value does not contain matcher's logs_path '/var/lib/docker/containers/', skipping..."}

Even though my container logs come from:

/var/log/containers/*.log


:test_tube: Current config where metadata is missing

(working but without matcher)

processors:
  - add_kubernetes_metadata:
      #host: ${NODE_NAME}
      #matchers:
      #  - logs_path:
      #      logs_path: "/var/log/containers/"

In this config, all logs go to no-topic, because kubernetes.namespace does not exist.


:test_tube: When I restore the matcher

  processors:
    - add_kubernetes_metadata:
        host: ${NODE_NAME}
        matchers:
          - logs_path:
              logs_path: "/var/log/containers/"
output.kafka:
  enable: true
  hosts: ["ip1:9092","ip2:9092","ip3:9092"]
  topic: '%{[kubernetes.namespace]}'

I again start seeing:

Dropping event: no topic could be selected

and the frequency becomes very stable — around every 20 seconds (timestamps below). So it is not only happening right after startup.

2025-12-02T04:31:23.375Z
2025-12-02T04:31:33.382Z
2025-12-02T04:31:53.390Z
2025-12-02T04:32:13.391Z
2025-12-02T04:32:33.393Z
2025-12-02T04:32:53.397Z
2025-12-02T04:33:03.399Z
2025-12-02T04:33:43.411Z
2025-12-02T04:33:53.405Z
...

It looks like something periodically fails to assign metadata → topic interpolation fails → event is dropped.

One more update that might be helpful:

I checked another Kubernetes cluster where Filebeat is still 7.10.2.
There, the Dropping event: no topic could be selected error also exists — but not periodically.

Example logs from that cluster:

2025-11-29T20:37:26.961Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected
2025-11-30T00:42:27.675Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected
2025-11-30T01:23:08.846Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected
2025-11-30T05:13:10.348Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected
2025-11-30T05:44:00.542Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected
2025-11-30T08:48:21.226Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected
2025-11-30T09:29:01.907Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected
2025-11-30T09:29:01.908Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected
2025-11-30T14:46:04.003Z  ERROR [kafka] kafka/client.go:147  Dropping event: no topic could be selected

So the “once every ~20 seconds” pattern I’m currently seeing on 9.2.1 may not be fundamental — the frequency might depend on workload or metadata availability rather than a fixed scheduler. I wanted to clarify that to avoid misleading conclusions.


For me, Dropping event: no topic could be selected is 100% reproducible, regardless of restart or load.
Given how common K8s + Kafka + metadata-based routing is, I am very surprised I don’t see other users reporting it.

I’m happy to test any config suggestion, collect more debug logs, or run a custom build if needed.

Thanks again for the time and help!

Just a short clarification:

Dropping event: no topic could be selected is not limited to pod startup.
Even for pods that have been running for a long time, I still see occasional events without any kubernetes.* fields — and those get dropped as well.

metadata can fail mid-lifecycle too, causing permanent log loss.

@ruizengyang Thanks for the details

sorry about the detour on the matcher... they have always been a bit confusing (I thought if the matcher was removed, is was an "all" condition, apparently not)

At this point, I do not have a simple answer

I am looking at the reference docs here

And this is the reference yml

It looks like you are mostly following the reference but not exactly but I don't think the differences are the issue.

Here is the reference implementation

filebeat.inputs:
- type: filestream
  id: container-logs
  prospector.scanner.symlinks: true
  parsers:
    - container: ~
  paths:
    - /var/log/containers/*.log
  processors:
    - add_kubernetes_metadata:
      host: ${NODE_NAME}
      matchers:
        - logs_path:
            logs_path: /var/log/containers/

A couple items:

Have you tried the Autodiscover approach? Does it result in the same issue?

#  providers:
#    - type: kubernetes
#      node: ${NODE_NAME}
#      hints.enabled: true
#      hints.default_config:
#        type: filestream
#        id: kubernetes-container-logs-${data.kubernetes.pod.name}-${data.kubernetes.container.id}
#        paths:
#        - /var/log/containers/*-${data.kubernetes.container.id}.log
#        parsers:
#        - container: ~
#        prospector:
#         scanner:
#           fingerprint.enabled: true
#           symlinks: true
#        file_identity.fingerprint: ~

Question that I still did not see answered:

When you have the matcher running, and most of the logs get routed to the topic based on namespace and the few get routed to the no-topic because missing the namespace... did you look at those actual logs/messages that get routed to "no-topic" and see if they were coming from an unexpected container/path etc.?? Is there something different about those logs / paths / containers etc?

Of course, we have not even delved into the K8s flavor, version config etc. which I am not sure we can.

If I get a chance, I will deploy filebeat to a GKE cluster and see if I see similar behavior

I do not use the kafka output, so I will need to think about how to test (I would get logs without kubernetes.namespace

I will also poke as see if anyone internal has seen this, I did a quick search and I am not seeing this as an issue, although I know there were some race conditions at some point, but the fact this is repeating on a regular intervals seems odd.

I haven’t tested autodiscover yet. I’m planning to try it next. But I remember seeing some discussions saying autodiscover has limited support in some cases, so it may not always be recommended.

For the logs routed to "no-topic", they don’t have any special pattern. For example, with the nginx ingress pod, most logs go to %{[kubernetes.namespace]}, but occasionally a few lines still end up in "no-topic" without any obvious difference.

Hi @ruizengyang

Sorry just got back to this.

I did some testing and where are my results

  1. Yes I see the missing kubernetes.namespace with filebeat inputs on my cluster, there indeed seems to be some race condition.

  2. I did NOT see the missing kubernetes.namespace when I used auto discover

  3. I did NOT ee the missing kubernetes.namespace when I used Elastic Agent Kubernets integration

Also not there is some difference if you are running a sidecar like istio... specifically it looks that in use case 1 above that was the difference.