Problem with Filebeat and Kafka

Hello,

I have a problem with my configuration filebeat+kafka+logstash.

I have a lot of files every minute and I am using kafka between filebeat and logstash to deal with that.

Filebeat send an error message:

2019-10-10T10:35:03.708+0200    ERROR   file/states.go:112      State for /home/test/Documents/PRODUCERS/Data/PmProducer/demo-C2019-10-10T10:20:00+02:00-2019-10-10T10:21:00+02:00_calv-vm25.lker.fr.xml should have been dropped, but couldn't as state is not finished.
2019-10-10T10:35:03.708+0200    ERROR   file/states.go:112      State for /home/test/Documents/PRODUCERS/Data/PmProducer/demo-C2019-10-10T10:21:00+02:00-2019-10-10T10:22:00+02:00_calv-vm26.lker.fr.xml should have been dropped, but couldn't as state is not finished.
2019-10-10T10:35:03.708+0200    ERROR   file/states.go:112      State for /home/test/Documents/PRODUCERS/Data/PmProducer/demo-C2019-10-10T10:21:00+02:00-2019-10-10T10:22:00+02:00_calv-vm41.lker.fr.xml should have been dropped, but couldn't as state is not finished.
2019-10-10T10:35:03.708+0200    ERROR   file/states.go:112      State for /home/test/Documents/PRODUCERS/Data/PmProducer/demo-C2019-10-10T10:20:00+02:00-2019-10-10T10:21:00+02:00_calv-vm30.lker.fr.xml should have been dropped, but couldn't as state is not finished.

my filebeat input:

- type: log
  enabled: true
  paths:
- /home/test/Documents/PRODUCERS/Data/PmProducer/*robustness*.xml
  fields:
  #  level: debug
     name: hdfs-pm-robu
  document_type: hdfs-pm-robu
  multiline.pattern: '<measInfo'
  multiline.negate: true
  ignore_older: 2m
  scan_frequency: 15s
  multiline.match: after
  close_inactive: 3m
  clean_inactive: 7m
  clean_removed: true

My filebeat output:

output.kafka:
  # initial brokers for reading cluster metadata
  hosts: ["kafka1:9092","kafka2:9092","kafka3:9092"]

  # message topic selection + partitioning
  topic: '%{[fields.name]}'
  partition.round_robin:
reachable_only: false

  #required_acks: 1
  #compression: gzip
  max_message_bytes: 100000000

When I have a data every minute I am getting this error message and there is data loss, when I generate data every 3 minutes there is no more data loss even if there is the same size of data in the two cases.

Thank you for help

You should set clean_inactive to a higher value otherwise it can cause data loss.
That may be the causing the problem.

Thank you for your reply, but I am still having the same problem. I set clean_inactive to 10m and commented other parameters.

What other params did you comment out ?
Can you please post your latest filebeat.xml ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.