Data loss in Filebeat to Kafka pipeline

Hi, I've been running a data pipeline for a few weeks now which uses Filebeat (V7.9.2) as a log harvester and publishes the data into Kafka. This new pipeline is a replacement for an older one based on Filebeat and Logstash which have been running for years now without any problem. When comparing the data output from these two pipelines I see that I'm missing a few messages every day in the new one. It is not much, between 10 - 20 out of +200k, but I need to be able to guarantee the delivery of each message.

My Kafka cluster (3 brokers) is configured in a way that it should not lose messages but I'm not sure how I should configure Filebeat to act in the same way.

My Kafka settings:

unclean.leader.election.enable=false
default.replication.factor=3
min.insync.replicas=2
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2

My current Filebeat Kafka output settings:

output.kafka:
  version: '2.0.0'
  hosts:
    - 'k8s-c4-w1:30656'
    - 'k8s-c4-w2:30656'
    - 'k8s-c4-w3:30656'
  topic: 'P2_OSS_AND_IP.R'
  partition.round_robin:
    reachable_only: false
  required_acks: -1
  max_retries: -1
  compression: 'none'

I know that the messages are getting lost between Filebeat and Kafka as they do not show up in the topic Filebeat is publishing too. There is no evidence to found either in the Filebeat or Kafka logs.

Do you have any recommendations on how to configure Filebeat to guarantee "At least once" message delivery?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.