Filebeat fails to harvest if a file and a symlink to that file is in the same directory

treff7es1 · May 11, 2016, 9:14am

Hi,

It seems to me if symlink exists in the directory where filebeat should work on then filebeat won't harvest.
Our system is writing logs in a format where every hour we name the logfile with the actual hour in the filename and we have a symlink named with current which always points to the latest file like this:
log_2015-01-01_00001
log_2015-01-01_00002
current -> log_2015-01-01_0002

If I start filebeat and symlink is in the actual directory then filebeat fails to harvest any of the files.

I could reproduce this issue in the following way:

Create a directory
Put a logfile there (test.log for example)
Create a symlink to that logfile at the same directory (test.current for example)
Start filebeat and it won't harvest that file.

I tried it with filebeat-5.0.0-alpha2-x86_64 on Ubuntu 12.04.

I could not even find these files in the registry as well:
cat data/registry |grep test|wc -l
0

See my debug logs attached.

If I set an exclude on the symlink then everything works fine.

Here are the debug logs:
2016/05/11 08:56:37.311071 beat.go:276: INFO filebeat start running.
2016/05/11 08:56:37.311147 registrar.go:60: INFO Registry file set to: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/data/registry
2016/05/11 08:56:37.311214 registrar.go:70: INFO Loading registrar data from /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/data/registry
2016/05/11 08:56:37.311665 log.go:12: WARN client/brokers registered new broker #1003 at x.x.x.x
2016/05/11 08:56:37.311781 log.go:12: WARN client/brokers registered new broker #1002 at y.y.y.y
2016/05/11 08:56:37.311945 log.go:16: WARN kafka message: Successfully initialized new client
2016/05/11 08:56:37.312345 spooler.go:41: DBG Spooler will use the default spool_size of 2048
2016/05/11 08:56:37.312403 spooler.go:47: DBG Spooler will use the default idle_timeout of 5s
2016/05/11 08:56:37.312531 crawler.go:37: INFO Loading Prospectors: 1
2016/05/11 08:56:37.312576 crawler.go:42: DBG File Configs: [/opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/]
2016/05/11 08:56:37.312625 prospector.go:241: INFO Set ignore_older duration to 0
2016/05/11 08:56:37.312666 prospector.go:241: INFO Set scan_frequency duration to 10s
2016/05/11 08:56:37.312723 prospector.go:177: INFO buffer_size set to: 16384
2016/05/11 08:56:37.312791 prospector.go:190: INFO input_type set to: log
2016/05/11 08:56:37.312827 prospector.go:241: INFO Set backoff duration to 1s
2016/05/11 08:56:37.312856 prospector.go:201: INFO backoff_factor set to: 2
2016/05/11 08:56:37.312886 prospector.go:241: INFO Set max_backoff duration to 10s
2016/05/11 08:56:37.312932 prospector.go:211: INFO force_close_file is disabled
2016/05/11 08:56:37.312964 prospector.go:241: INFO Set close_older duration to 1h0m0s
2016/05/11 08:56:37.312993 prospector.go:222: INFO max_bytes set to: 10485760
2016/05/11 08:56:37.313075 prospector_log.go:40: DBG exclude_files: [.gz$]
2016/05/11 08:56:37.313116 prospector_log.go:79: DBG scan path /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/
2016/05/11 08:56:37.313281 prospector_log.go:92: DBG Check file for harvesting: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:37.313295 spooler.go:75: INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2016/05/11 08:56:37.313522 prospector_log.go:150: DBG Start harvesting unknown file: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:37.313674 registrar.go:174: INFO New file. Start reading from the beginning: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:37.313729 prospector_log.go:263: DBG Start / resuming harvester of file: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:37.313784 prospector_log.go:92: DBG Check file for harvesting: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.log
2016/05/11 08:56:37.313837 prospector_log.go:150: DBG Start harvesting unknown file: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.log
2016/05/11 08:56:37.313711 registrar.go:77: INFO Starting Registrar
2016/05/11 08:56:37.313951 registrar.go:174: INFO New file. Start reading from the beginning: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.log
2016/05/11 08:56:37.314127 prospector_log.go:251: DBG Launching harvester on renamed file. File rename was detected: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current -> /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.log
2016/05/11 08:56:37.314217 log.go:227: DBG harvest: "/opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current" (offset snapshot:0)
2016/05/11 08:56:37.314287 log.go:41: INFO Harvester started for file: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:42.313444 spooler.go:132: DBG Flushing spooler because of timeout. Events flushed: 0

Thanks,
Tamas

ruflin · May 12, 2016, 6:14am

@treff7es1 Thanks for the report. I will have a closer look at this.

ruflin · May 20, 2016, 10:52am

@treff7es1 I started to investigate this issue. The first thing I asked myself is what do we expect if there is a symlink to a file which is also in the same directory? Does it just add the orginal to our harvester list? Should we even "respect" symlinks or ignore them?

I couldn't reproduce the behaviour about not reading at all, but it currently reads the file twice (which is somehow expected as it treats the symlink like a file). Each symlink has its own inode ...

treff7es1 · May 20, 2016, 11:24am

@ruflin I would expect to harvest only the file where the symlink points and not add the symlink itself. I think that would be a great idea if it could be set if symlink should be skipped.

I will try the usecase again on my side to double check the not harvesting case.

ruflin · May 20, 2016, 11:35am

@treff7es1 I opened a Github issue here with this: https://github.com/elastic/beats/issues/1686 Lets move our conversation there.

Topic		Replies	Views
Filebeat 5.5.1 is not work with symlink file when symlinks option has been set true Beats filebeat	3	3812	September 4, 2017
No error for permission if symlinks involved in paths for filebeat Beats filebeat	4	800	October 17, 2017
New files detected but data not sent. offset remains at 0 Beats filebeat	7	2369	August 17, 2016
FileBeat misses lines at the beginning of a new file Beats filebeat	2	496	May 17, 2017
Maybe I missed something: FIlebeat prospectors seems to randomly stop harvesting 'some' files Beats filebeat	7	4065	July 5, 2017

Filebeat fails to harvest if a file and a symlink to that file is in the same directory

Related topics