Filebeat fails to harvest if a file and a symlink to that file is in the same directory


(Treff7es) #1

Hi,

It seems to me if symlink exists in the directory where filebeat should work on then filebeat won't harvest.
Our system is writing logs in a format where every hour we name the logfile with the actual hour in the filename and we have a symlink named with current which always points to the latest file like this:
log_2015-01-01_00001
log_2015-01-01_00002
current -> log_2015-01-01_0002

If I start filebeat and symlink is in the actual directory then filebeat fails to harvest any of the files.

I could reproduce this issue in the following way:

  1. Create a directory
  2. Put a logfile there (test.log for example)
  3. Create a symlink to that logfile at the same directory (test.current for example)
  4. Start filebeat and it won't harvest that file.

I tried it with filebeat-5.0.0-alpha2-x86_64 on Ubuntu 12.04.

I could not even find these files in the registry as well:
cat data/registry |grep test|wc -l
0

See my debug logs attached.

If I set an exclude on the symlink then everything works fine.

Here are the debug logs:
2016/05/11 08:56:37.311071 beat.go:276: INFO filebeat start running.
2016/05/11 08:56:37.311147 registrar.go:60: INFO Registry file set to: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/data/registry
2016/05/11 08:56:37.311214 registrar.go:70: INFO Loading registrar data from /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/data/registry
2016/05/11 08:56:37.311665 log.go:12: WARN client/brokers registered new broker #1003 at x.x.x.x
2016/05/11 08:56:37.311781 log.go:12: WARN client/brokers registered new broker #1002 at y.y.y.y
2016/05/11 08:56:37.311945 log.go:16: WARN kafka message: Successfully initialized new client
2016/05/11 08:56:37.312345 spooler.go:41: DBG Spooler will use the default spool_size of 2048
2016/05/11 08:56:37.312403 spooler.go:47: DBG Spooler will use the default idle_timeout of 5s
2016/05/11 08:56:37.312531 crawler.go:37: INFO Loading Prospectors: 1
2016/05/11 08:56:37.312576 crawler.go:42: DBG File Configs: [/opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/]
2016/05/11 08:56:37.312625 prospector.go:241: INFO Set ignore_older duration to 0
2016/05/11 08:56:37.312666 prospector.go:241: INFO Set scan_frequency duration to 10s
2016/05/11 08:56:37.312723 prospector.go:177: INFO buffer_size set to: 16384
2016/05/11 08:56:37.312791 prospector.go:190: INFO input_type set to: log
2016/05/11 08:56:37.312827 prospector.go:241: INFO Set backoff duration to 1s
2016/05/11 08:56:37.312856 prospector.go:201: INFO backoff_factor set to: 2
2016/05/11 08:56:37.312886 prospector.go:241: INFO Set max_backoff duration to 10s
2016/05/11 08:56:37.312932 prospector.go:211: INFO force_close_file is disabled
2016/05/11 08:56:37.312964 prospector.go:241: INFO Set close_older duration to 1h0m0s
2016/05/11 08:56:37.312993 prospector.go:222: INFO max_bytes set to: 10485760
2016/05/11 08:56:37.313075 prospector_log.go:40: DBG exclude_files: [.gz$]
2016/05/11 08:56:37.313116 prospector_log.go:79: DBG scan path /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/

2016/05/11 08:56:37.313281 prospector_log.go:92: DBG Check file for harvesting: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:37.313295 spooler.go:75: INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2016/05/11 08:56:37.313522 prospector_log.go:150: DBG Start harvesting unknown file: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:37.313674 registrar.go:174: INFO New file. Start reading from the beginning: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:37.313729 prospector_log.go:263: DBG Start / resuming harvester of file: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:37.313784 prospector_log.go:92: DBG Check file for harvesting: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.log
2016/05/11 08:56:37.313837 prospector_log.go:150: DBG Start harvesting unknown file: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.log
2016/05/11 08:56:37.313711 registrar.go:77: INFO Starting Registrar
2016/05/11 08:56:37.313951 registrar.go:174: INFO New file. Start reading from the beginning: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.log
2016/05/11 08:56:37.314127 prospector_log.go:251: DBG Launching harvester on renamed file. File rename was detected: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current -> /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.log
2016/05/11 08:56:37.314217 log.go:227: DBG harvest: "/opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current" (offset snapshot:0)
2016/05/11 08:56:37.314287 log.go:41: INFO Harvester started for file: /opt/filebeat/filebeat-5.0.0-alpha2-x86_64/test/test.current
2016/05/11 08:56:42.313444 spooler.go:132: DBG Flushing spooler because of timeout. Events flushed: 0

Thanks,
Tamas


(ruflin) #2

@treff7es1 Thanks for the report. I will have a closer look at this.


(ruflin) #3

@treff7es1 I started to investigate this issue. The first thing I asked myself is what do we expect if there is a symlink to a file which is also in the same directory? Does it just add the orginal to our harvester list? Should we even "respect" symlinks or ignore them?

I couldn't reproduce the behaviour about not reading at all, but it currently reads the file twice (which is somehow expected as it treats the symlink like a file). Each symlink has its own inode ...


(Treff7es) #4

@ruflin I would expect to harvest only the file where the symlink points and not add the symlink itself. I think that would be a great idea if it could be set if symlink should be skipped.

I will try the usecase again on my side to double check the not harvesting case.


(ruflin) #5

@treff7es1 I opened a Github issue here with this: https://github.com/elastic/beats/issues/1686 Lets move our conversation there.


(system) #6