Filebeat fails to pickup other files

Hello I run into this type of issue:

2016-05-27T20:05:19Z INFO Old file with new name found: /flow_base_dir/dump/flows-201605270835.dump is no /flow_base_dir/dump/flows-201605272000.dump
2016-05-27T20:05:19Z INFO Detected rename of a previously harvested file: /flow_base_dir/dump/flows-201605270835.dump -> /flow_base_dir/dump/flows-201605272000.dump
2016-05-27T20:05:19Z INFO Registry file updated. 672 states written.
2016-05-27T20:05:29Z INFO Run prospector
2016-05-27T20:05:39Z INFO Run prospector
2016-05-27T20:05:49Z INFO Run prospector
2016-05-27T20:05:59Z INFO Run prospector
2016-05-27T20:06:06Z INFO Read line error: file inactive

these dump files are generated every five minutes and they are unique. So filebeat will pick them up and send them to Kafka. I have a cron that will run every 4 hours and delete older files. I wonder if it contributes in any way.

Is there to tell filebeat to drop older files, or I guess update registry?

to be clear you are saying .. new files are be dropped into the path and are not being processed by filebeat?

are the filenames unique? Can you share you config file too?

Yes new files are dropped into the path. Here is snippet. Filenames are unique

filebeat:
prospectors:
-
paths:
- /flow_base_dir/dump/*.dump
document_type: netflow

logging:
level: info

# enable file rotation with default configuration
to_files: true

# do not log to syslog
to_syslog: false

files:
  path: /home/t/filebeat-5.0.0-alpha1-x86_64/var/log
  name: filebeat.log
  keepfiles: 7

After I restart filebeat things start working

hmmm .... hmmm what are the logs saying? Is it recognizing the new files but not processing them?

question about the below log message: are the files unique but are generated multiple times ... i.e. does jaren..log come at 1pm .. get deleted and jaren.log get created again at 2 pm ?

2016-05-27T20:05:19Z INFO Detected rename of a previously harvested file: /flow_base_dir/dump/flows-201605270835.dump -> /flow_base_dir/dump/flows-201605272000.dump

Could it be that you hit this issue here? https://github.com/elastic/beats/issues/1341

Files are unique, since file formatted based on the date/time stamp

I added ignore_older: 15 min, in which case filebeat is picking up files, that i wouldn't do before, but now I am starting to see these:

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 27] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 3] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 12] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 6] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 21] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 18] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 15] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 30] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 9] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 0] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[0 %!d(string=herd-netflow-wc1) 24] state change to [open] on %!s(MISSING)/%!d(MISSING)

2016-05-28T06:45:24Z WARN producer/broker/[1] starting up

In this case there is nothing being written to Kafka. Is this something with filebeat or should I check with Kafka. Things seem to pickup byitself

Ignore_older, supposed to address the inode reuse

The warnings above seem to be related to Kafka output (@steffens ? ).

About ignore_older: ignore_older currently does not solve the inode reuse issue. For more details on options how we plan to solve it, see this issue here: https://github.com/elastic/beats/issues/1600

which filebeat version have you running?

The kafka messages are by kafka circuit-breaker due to output being unresponsive => no lines being published.

I am using filebeat-5.0.0-alpha1-x86_64, and I meant close_older not ignore_older. Is close_older also not functional?

So how can I cleanup these dump files without affecting filebeat? Does that mean I have to restart filebeat every time, I do this clean up? I clean up every 4 hours.

Will updating .filebeat registry by removing records that have association to removed files suffice for filebeat to release inodes? If it doesn't require restart, then I can probably go this route.

I think there is some misunderstand of close_older and inode reuse as they are not directly related. If something works with a restart, it should also work with keeping filebeat running.

Your issue is that new files are not picked up before you restart? Can you please update to alpha3 and check if your problems persist?

So as suggested I upgraded to alpha3. I stopped the cron of having to restart filebeat. Let's see how it does

2016-06-01T18:35:25Z INFO Home path: [/home/t/filebeat-5.0.0-alpha3-linux-x64] Config path: [/home/t/filebeat-5.0.0-alpha3-linux-x64] Data path: [/home/t/filebeat-5.0.0-alpha3-linux-x64/data] Logs path: [/home/t/filebeat-5.0.0-alpha3-linux-x64/logs]
2016-06-01T18:35:25Z INFO Setup Beat: filebeat; Version: 5.0.0-alpha3
2016-06-01T18:35:25Z INFO Activated kafka as output plugin.
2016-06-01T18:35:25Z INFO Publisher name: blah.com
2016-06-01T18:35:25Z INFO Flush Interval set to: 1s
2016-06-01T18:35:25Z INFO Max Bulk Size set to: 2048
2016-06-01T18:35:25Z INFO filebeat start running.
2016-06-01T18:35:25Z INFO Registry file set to: /home/t/filebeat-5.0.0-alpha3-linux-x64/data/registry
2016-06-01T18:35:25Z INFO Loading registrar data from /home/t/filebeat-5.0.0-alpha3-linux-x64/data/registry
2016-06-01T18:35:25Z INFO States Loaded from registrar: 37
2016-06-01T18:35:25Z INFO Loading Prospectors: 1
2016-06-01T18:35:25Z INFO buffer_size set to: 16384
2016-06-01T18:35:25Z INFO document_type set to: netflow
2016-06-01T18:35:25Z INFO Invalid input type set:
2016-06-01T18:35:25Z INFO input_type set to: log
2016-06-01T18:35:25Z INFO Set backoff duration to 1s
2016-06-01T18:35:25Z INFO backoff_factor set to: 2
2016-06-01T18:35:25Z INFO Set max_backoff duration to 10s
2016-06-01T18:35:25Z INFO force_close_file is disabled
2016-06-01T18:35:25Z INFO Set close_older duration to 1h0m0s
2016-06-01T18:35:25Z INFO max_bytes set to: 10485760
2016-06-01T18:35:25Z INFO Load previous states from registry into memory
2016-06-01T18:35:25Z INFO Previous states loaded: 37
2016-06-01T18:35:25Z INFO Loading Prospectors completed. Number of prospectors: 1
2016-06-01T18:35:25Z INFO All prospectors are initialised and running with 37 states to persist
2016-06-01T18:35:25Z INFO Start sending events to output
2016-06-01T18:35:25Z INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2016-06-01T18:35:25Z INFO Starting Registrar
2016-06-01T18:35:25Z INFO Starting prospector of type: log
2016-06-01T18:35:25Z INFO Harvester started for file: /flow_base_dir/dump/flows-201606011810.dump
2016-06-01T18:35:25Z INFO Harvester started for file: /flow_base_dir/dump/flows-201606011745.dump
2016-06-01T18:35:25Z INFO Harvester started for file: /flow_base_dir/dump/flows-201606011800.dump
2016-06-01T18:35:25Z INFO Harvester started for file: /flow_base_dir/dump/flows-201606011530.dump
2016-06-01T18:35:25Z INFO Harvester started for file: /flow_base_dir/dump/flows-201606011600.dump
2016-06-01T18:35:25Z INFO Harvester started for file: /flow_base_dir/dump/flows-201606011605.dump

Thanks. Let me know in case you hit some issues.

i'm having the exact issue, if I restart the service it will grab the new logs, but it won't grab any more that are dropped afterwards. But if you run it in command prompt it will pick up all new logs...

filebeat -c filebeat filebeat.yml -e

I am using alpha 3 as well, was there a specific configuration you had to set to make this work?

I must say, that alpha3 version hasn't had any issues yet. Going to keep my fingers crossed. No issues with old files that have deleted, nor issues with new files being picked up.

1 Like

@kalapakim Can you share some more details on your OS?

Windows Server 2012 and I'm using Filebeat 5.0.0-alpha3.

There are more details in the issue I opened...

filebeat-not-polling-the-directory

Just to update,

So I am not seeing the rename, or Old file with new name in the filebeat logs anymore. Looks like alpha3 install fixed the issues I was having. I compared the record count in my dump files and what's in ElasticSearch count matches up. So I am not loosing any data. The only thing I am seeing is:

Read line error: file inactive

This should be due to my old file cleanup. Does this mean that filebeat has released the inode for that file. Does it update filebeat registry automatically?

Here is another issue I am seeing:

I am seeing files being truncated, and I am using filebeat-5.0.0-alpha3-linux-x64

2016-06-15T00:05:20Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150000.dump
2016-06-15T00:10:20Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150005.dump
2016-06-15T00:15:20Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150010.dump
2016-06-15T00:20:21Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150015.dump
2016-06-15T00:25:21Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150020.dump
2016-06-15T00:30:21Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150025.dump
2016-06-15T00:35:11Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150030.dump
2016-06-15T00:40:12Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150035.dump
2016-06-15T00:45:12Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150040.dump
2016-06-15T00:50:12Z INFO File was truncated. Begin reading file from offset 0: /flow_base_dir/dump/flows-201606150045.dump

I also noticed, that filebeat-5.0.0 alpha4 has been posted, should I test that one?