Filebeats not sending all logs from file

Hi, I have csv file which is generated every monday. The file is being overwritten every time. Filebeats is monitoring that file and sending it to kafka. It has a weird case where sometimes it sends all the logs, but a few times it sends only some logs like 200-1000 out of 3M logs. In this case, if I rename the file or move it somewhere and back it reads the file again and works properly.

This is the filebeat config.

filebeat.prospectors:
- input_type: log
  paths:
    - C:\path\to\report.csv
  document_type: type
tags: ["topic"]
output.kafka:
  enabled: true
  hosts: ["kafka1:9090","kafka2:9090","kafka3:9090"]
  topic: topic1
  
#================================ Logging =====================================
logging.level: info

Does anyone know what could be the issue here? The file here is not streaming logs here but is just a weekly dump. Is there any arguments in the config for this kind of use case?

This is the output from filebeats log when only few logs were sent to kafka.

2017-10-16T02:47:27-07:00 INFO Harvester started for file: C:\path\to\report.csv
2017-10-16T02:47:32-07:00 WARN kafka message: Initializing new client
2017-10-16T02:47:32-07:00 WARN client/metadata fetching metadata for all topics from broker kafka1:9090

2017-10-16T02:47:32-07:00 WARN Connected to broker at usmvkafka1:9092 (unregistered)

2017-10-16T02:47:32-07:00 WARN client/brokers registered new broker #2 at kafka3:9090
2017-10-16T02:47:32-07:00 WARN client/brokers registered new broker #1 at kafka2:9090
2017-10-16T02:47:32-07:00 WARN client/brokers registered new broker #0 at kafka1:9090
2017-10-16T02:47:32-07:00 WARN kafka message: Successfully initialized new client
2017-10-16T02:47:32-07:00 WARN producer/broker/0 starting up

2017-10-16T02:47:32-07:00 WARN producer/broker/0 state change to [open] on topic/1

2017-10-16T02:47:32-07:00 WARN producer/broker/2 starting up

2017-10-16T02:47:32-07:00 WARN producer/broker/2 state change to [open] on topic/0

2017-10-16T02:47:32-07:00 WARN producer/broker/1 starting up

2017-10-16T02:47:32-07:00 WARN producer/broker/1 state change to [open] on topic/2

2017-10-16T02:47:32-07:00 WARN Connected to broker at kafka1:9090 (registered as #0)

2017-10-16T02:47:32-07:00 WARN Connected to broker at kafka2:9090 (registered as #1)

2017-10-16T02:47:32-07:00 WARN Connected to broker at kafka3:9090 (registered as #2)

This is the output from when everything worked as intended after I renamed the file and back.

2017-10-16T09:12:46-07:00 INFO Harvester started for file: C:\path\to\report.csv
2017-10-16T09:12:46-07:00 WARN producer/broker/2 state change to [closing] because kafka: broker not connected

2017-10-16T09:12:46-07:00 WARN producer/broker/1 state change to [closing] because kafka: broker not connected

2017-10-16T09:12:46-07:00 WARN producer/broker/0 state change to [closing] because kafka: broker not connected

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/0 state change to [retrying-1]

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/0 abandoning broker 2

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/2 state change to [retrying-1]

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/2 abandoning broker 1

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/1 state change to [retrying-1]

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/1 abandoning broker 0

2017-10-16T09:12:46-07:00 WARN producer/broker/2 shut down

2017-10-16T09:12:46-07:00 WARN producer/broker/0 shut down

2017-10-16T09:12:46-07:00 WARN producer/broker/1 shut down

2017-10-16T09:12:46-07:00 WARN Connected to broker at kafka3:9090 (registered as #2)

2017-10-16T09:12:46-07:00 WARN client/metadata fetching metadata for [topic] from broker kafka3:9090

2017-10-16T09:12:46-07:00 WARN client/metadata fetching metadata for [topic] from broker kafka3:9090

2017-10-16T09:12:46-07:00 WARN client/metadata fetching metadata for [topic] from broker kafka3:9090

2017-10-16T09:12:46-07:00 WARN producer/broker/2 starting up

2017-10-16T09:12:46-07:00 WARN producer/broker/2 state change to [open] on topic/0

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/0 selected broker 2

2017-10-16T09:12:46-07:00 WARN producer/broker/0 starting up

2017-10-16T09:12:46-07:00 WARN producer/broker/1 starting up

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/0 state change to [flushing-1]

2017-10-16T09:12:46-07:00 WARN producer/broker/1 state change to [open] on topic/2

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/0 state change to [normal]

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/1 selected broker 0

2017-10-16T09:12:46-07:00 WARN producer/broker/0 state change to [open] on topic/1

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/2 selected broker 1

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/1 state change to [flushing-1]

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/1 state change to [normal]

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/2 state change to [flushing-1]

2017-10-16T09:12:46-07:00 WARN producer/leader/topic/2 state change to [normal]

2017-10-16T09:12:46-07:00 WARN Connected to broker at kafka2:9090 (registered as #1)

Thanks

Is is possible to rotate the old file out (move) or delete it before writing the new data? Probably the inode never changes so Filebeat always views this as the same file and only sends more lines when the size grows beyond the current read offset. See the recommendations from the FAQ on inode reuse.

I see. Yes, I can try and remove the old file from the directory. It is a windows system, so will it still be an inode issue? Do I use the clean_inactive option?

On Linux file systems, Filebeat uses the inode and device to identify files.

On Windows it uses the analogous filesystem information to inodes so it has the same problems. I think the clean_inactive option should work for your case.

so lets say I have the file coming in every monday morning, and I delete it every tuesday. What should be the value of clean_inactive? Do I have to set ignore_older as well?

I would try setting clean_inactive: 24h. You don't need to set ignore_older for this case.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.