We are observing duplicate messages being published to Kafka from Filebeat, even when the Kafka broker is up and stable. The issue occurs while using the disk queue (queue.disk ) configuration
Kafka Output Configuration (unified for both sources)
output.kafka:
hosts: ["xxxxxx:19092", "xxxxx:19093", "xxxx:19094"] # All three brokers set to same IP
topic: "${KAFKA_TOPIC}"
key: '%{[UUID]}'
codec.format:
string: '%{[message]}'
partition.hash:
reachable_only: true
required_acks: -1
compression: gzip
compression_level: 4
max_message_bytes: 104857600 # 100MB - supports large Apache audit logs with attachments
version: "2.1.0"
client_id: "filebeat-prod"
bulk_max_size: 1
bulk_flush_frequency: 10ms
channel_buffer_size: 256
keep_alive: 30s
max_retries: -1
backoff.init: 1s
backoff.max: 30s
timeout: 2s
broker_timeout: 10s
worker : 1
Enhanced Security Configuration
#sasl.mechanism: SCRAM-SHA-512
#username: "${KAFKA_USERNAME}"
#password: "${KAFKA_PASSWORD}"
security.protocol: PLAINTEXT
#ssl.enabled: true
#ssl.verification_mode: full
#ssl.certificate_authorities: ["/etc/filebeat/certs/ca-cert.pem"]
#ssl.verification_mode: none
Disable console logging to keep it clean
logging.to_stderr: true # Keep enabled to see error messages during troubleshooting
Performance and Monitoring Configuration
Use persistent disk queue for reliable, no-duplicate, at-least-once delivery
queue.disk:
path: /var/lib/filebeat/disk-queue
max_size: 20GB
segment_size: 100MB
read.buffer_size: 4MB
write.buffer_size: 4MB