Filebeat exclude_files is not working as expected

Hi everyone,
I have the following structure of directories and I am trying to avoid duplications by excluding "current" dir:

# ls -l
total 12
drwxrwxr-x 11 node node 4096 May 25 10:42 3.123.0
drwxrwxr-x 11 node node 4096 May 30 10:16 3.124.0
lrwxrwxrwx  1 node node   12 May 30 10:16 current -> /ver/3.124.0
drwxrwxr-x  2 node node 4096 Jun  1  2020 logs

This is my configuration filebeat yml input:


############################## Inputs ##################################
filebeat.inputs:
- type: filestream
  enabled: true
  paths:
    - "/ver/*/logs/*.log"
  exclude_files: '/^\/ver\/(current)\/logs.+.log/gm'

the regexp was verified with https://regex101.com/
However I still see logs from current dir arrive to Kibana.

Any help would be much appreciated.
Thanks,
Omer.

Hi @omeryosef Welcome to the community.

you are using log input syntax in filestream input, it will not work,



filebeat.inputs:
- type: filestream
  ...
  prospector.scanner.exclude_files: ['\.gz$']


1 Like

Hi,
Thanks for your reply!
Regarding the type, I am working with filebeat 7.16 so I see the log type is deprecated in the docs so I am using filestream and assume that the rest of the syntax is the same:

Anyway,
using the filestream type with these options didn't work as well (I still see logs from current folder).

Tried both:
prospector.scanner.exclude_files: ['/^/ver/(current)/logs.+.log/g']
prospector.scanner.exclude_files: ['/^/ver/(current)/logs.+.log/gm']

############################## Inputs ##################################
filebeat.inputs:
- type: filestream
  enabled: true
  paths:
    - "/ver/*/logs/*.log"
  prospector.scanner.exclude_files: ['/^\/ver\/(current)\/logs.+.log/g']

tags: ["portal","node-5","node-6"]

Not sure what I am missing...

Hi @omeryosef

current file or folder is a symlink to /ver/3.124.0 and from the listing shared

/ver/3.124.0 is the current folder or file and is not excluded. right ?

right, and I see that the file is current (duplicate events).
BTW, it didn't work with log type as well with this config:

############################## Inputs ##################################
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - "/ver/*/logs/*.log"
  exclude_files: '/^\/ver\/(current)\/logs.+.log/gm'

And I have validate the regex with https://regex101.com/ as I mentioned.

Hi @omeryosef

exclude_files is a list and by default, the symlinks are disabled.

  # If symlinks is enabled, symlinks are opened and harvested. The harvester is opening the
  # original for harvesting but will report the symlink name as source.
  #symlinks: false
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

I see,
thanks a lot for putting my attention to this.
I need the indication of the original dir (in the above ls -l output it would be 3.124.0).
In order to do that I am using the following configuration:

############################## Inputs ##################################
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - "/ver/*/logs/*.log"
  exclude_files:
    - '^\/ver\/.+(current)\/logs.+.log'
  multiline.pattern: '^[[:upper:]]|^\[[0-9]{4}-[0-9]{2}-[0-9]{2}|^[0-9]{4}-[0-9]{2}-[0-9]{2}|^[0-9]{2}\/[0-9]{2}\/[0-9]{4}|^[0-9]{2}-[[:alpha:]]{3}-[0-9]{4}|^\{'
  multiline.negate: true
  multiline.match: after

currently I can see logs from the 3.124.0 and I don't see logs from "current".
is this configuration can cause data loss? or is is the right way to have all data with indication from the original dir (3.124.0) and not the symlink (current)?

Thank a lot!