Usage of filestream

Hello,

I have a few question about the topic filestream and it's difference to the input log.

  1. Do I only need an id when using multiple filebeat inputs in a single yml or always? Currently im not using any ids but im curious if i may run into trouble.

  2. Can prospector options be nested e.g. like:

prospector
  scanner
    exclude_files: ...
    check_interval: ...
  1. How does the exclude_files work? In the migrating to filestream docu (Step 2: Exclude all processed files | Filebeat Reference [8.5] | Elastic) they got:
  paths:
    - /var/log/my-application*.json
  prospector.scanner.exclude_files: my-application[1-2]{1}.log

Does this mean that my-application*.log is excluded from the path /var/log/ or where is the exclusion happening?

  1. How are multiple excluded files separeted? i'd assume its ['file1_pattern', 'file2_pattern']?

  2. I'm using scan_frequency with type filestream, so according to the renaming table (Step 3: Use new option names | Filebeat Reference [8.5] | Elastic), it's not working. do both options still work anyways or do i have to rename it in every .yml?

  3. What is the difference between e.g. paths: /var/log/.log and include_files: /var/log/.log? When would i use one over the other?

Thanks in advance,
Ossenfeld

Any one on question 1 at least? :slight_smile:

@faec Could you help with these questions please? Thanks!!

Might it help if i open a thread for every question?

  1. We recommend always using an id, but current releases will assign a default for single inputs.
  2. Yes but prospector and scanner should be followed by a colon :
  3. In filestream this parameter is a list so you should instead use exclude_files: ["my-application[1-2]{1}.log", "some-other-pattern..."]. The exclusions are regular expressions, and any file that matches that regular expression will not be ingested.
  4. (see previous example)
  5. No, if you switch to the filestream input then any instances of the old scan_frequency parameter should be replaced with prospector.scanner.check_interval
  6. paths specifies where the input should look for possible files. If you want to ingest all files matching those paths, then there's no need to do anything else. If you want to only ingest some of those files, then adding a regular expression to include_files will only ingest files that are in one of the configured paths and match the given regular expression.

Exclude_files seems to be buggy I think. I used the two following filebeat configurations:

# vim: ft=yaml

- type: filestream
  id: json-collector
  paths:
  ¦ - /var/log/parser-testing/*
  fields:
  ¦ parser.test: "json_only"

  fields_under_root: true
  ignore_older: 30m
  close.on_state_change.inactive: 5m
  prospector:
  ¦ scanner:
  ¦ ¦ check_interval: 1s
  ¦ ¦ exclude_files: [".*.log"]
  parsers:
  ¦ - ndjson:
  ¦ ¦ ¦ keys_under_root: true
  ¦ ¦ ¦ expand_keys: true
  ¦ ¦ ¦ add_error_key: true

and

---
# vim: ft=yaml

- type: filestream
  id: log-collector
  paths:
  ¦ - /var/log/parser-testing/*
  fields:
  ¦ parser.test: "json_excluded"

  fields_under_root: true
  ignore_older: 30m
  close.on_state_change.inactive: 5m
  prospector:
  ¦ scanner:
  ¦ ¦ check_interval: 1s
  ¦ ¦ exclude_files: [".*.json"]
  parsers:
  ¦ - multiline:
  ¦ ¦ ¦ type: pattern
  ¦ ¦ ¦ pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
  ¦ ¦ ¦ match: after
  ¦ ¦ ¦ negate: true

Files named xyz.log are collected, but xyz.json isn't. When removing the exclude_files, everything including xyz.json is collected. I could just specify the path like *.log and *.json, but I really would like to know what's going on with the exclude_files?

Any ideas?

Maybe one last bump :smiley: