Hello,
I have a problem with my configuration filebeat+kafka+logstash.
I have a lot of files every minute and I am using kafka between filebeat and logstash to deal with that.
Filebeat send an error message:
2019-10-10T10:35:03.708+0200 ERROR file/states.go:112 State for /home/test/Documents/PRODUCERS/Data/PmProducer/demo-C2019-10-10T10:20:00+02:00-2019-10-10T10:21:00+02:00_calv-vm25.lker.fr.xml should have been dropped, but couldn't as state is not finished.
2019-10-10T10:35:03.708+0200 ERROR file/states.go:112 State for /home/test/Documents/PRODUCERS/Data/PmProducer/demo-C2019-10-10T10:21:00+02:00-2019-10-10T10:22:00+02:00_calv-vm26.lker.fr.xml should have been dropped, but couldn't as state is not finished.
2019-10-10T10:35:03.708+0200 ERROR file/states.go:112 State for /home/test/Documents/PRODUCERS/Data/PmProducer/demo-C2019-10-10T10:21:00+02:00-2019-10-10T10:22:00+02:00_calv-vm41.lker.fr.xml should have been dropped, but couldn't as state is not finished.
2019-10-10T10:35:03.708+0200 ERROR file/states.go:112 State for /home/test/Documents/PRODUCERS/Data/PmProducer/demo-C2019-10-10T10:20:00+02:00-2019-10-10T10:21:00+02:00_calv-vm30.lker.fr.xml should have been dropped, but couldn't as state is not finished.
my filebeat input:
- type: log
enabled: true
paths:
- /home/test/Documents/PRODUCERS/Data/PmProducer/*robustness*.xml
fields:
# level: debug
name: hdfs-pm-robu
document_type: hdfs-pm-robu
multiline.pattern: '<measInfo'
multiline.negate: true
ignore_older: 2m
scan_frequency: 15s
multiline.match: after
close_inactive: 3m
clean_inactive: 7m
clean_removed: true
My filebeat output:
output.kafka:
# initial brokers for reading cluster metadata
hosts: ["kafka1:9092","kafka2:9092","kafka3:9092"]
# message topic selection + partitioning
topic: '%{[fields.name]}'
partition.round_robin:
reachable_only: false
#required_acks: 1
#compression: gzip
max_message_bytes: 100000000
When I have a data every minute I am getting this error message and there is data loss, when I generate data every 3 minutes there is no more data loss even if there is the same size of data in the two cases.
Thank you for help