Harvesting same file with different prospector config

Does anyone see any issue with harvesting same file from different filebeat prospectors each having its own custom fields and set of include exclude regexs?

My usecase is, i want filebeat to parse a single file for certain set of include and exclude patterns and set different type field value. Then using when condition in filebeat config send the parsed data to different indexes in elasticsearch. Filebeat seems to do that pretty neatly but does anyone sees any issue with this approach?

Version - 5.1.1-1

filebeat:
  prospectors:
    -
      paths:
        - /var/log/test.log
      input_type: log
      document_type: alert
      ignore_older: 24h
      exclude_lines: ['^DBG']
      include_lines: ['^ERROR']
      scan_frequency: 10s
      backoff: 1s
      max_backoff: 10s
      fields:
        sev: "MAJOR"
        label: "ERROR"
      backoff_factor: 2
      force_close_files: false
      fields_under_root: false
      close_older: 2h
    -
      paths:
        - /var/log/test.log
      input_type: log
      document_type: alert
      ignore_older: 24h
      exclude_lines: ['java\.lang\.IllegalArgumentException: Document base']
      include_lines: ['ERROR LogMananger\.repositorySelector was null']
      scan_frequency: 10s
      backoff: 1s
      max_backoff: 10s
      fields:
         sev: "MINOR"
         label: "repo-issue"
      backoff_factor: 2
      force_close_files: false
      fields_under_root: false
      close_older: 2h
    -
      paths:
        - /var/log/test.log
      input_type: log
      document_type: alert
      ignore_older: 24h
      exclude_lines: ['java\.lang\.IllegalArgumentException: Document base']**
      include_lines: ['SEVERE: Servlet\.service','Stopping service Catalina']**
      scan_frequency: 10s
      backoff: 1s
      max_backoff: 10s
      fields:
        sev: "MINOR"
        label: "Servlet_Failure"
      backoff_factor: 2
      force_close_files: false
      fields_under_root: false
      close_older: 2h
output:
  elasticsearch:
    hosts: ['https://xxxxxxxxxx:443']
    index: filebeat-%{+yyyy.MM.dd}
    indices:
    - index: "alert-%{+yyyy.MM.dd}"
      when.contains:
        type: "alert"
name: grafana-mon-GrafanaApp-15FIOD3L34PR

Defining the same file in multiple prospectors will cause problems due to how the read offset is persisted to disk. And in later versions of Filebeat I suspect it will yield an error on startup.

Using Logstash to annotate the events would make sense.

Additionally I could see this being a feature of Beats whereby processors could be used to conditionally add tags and fields. Like:

processors:
- add_fields:
    when.regexp.message: `repositorySelector`
    fields:
      label: "repo-issue"

So far i havent seen any issue in the test environment. I do see that in registry file there is one inode used for test.log file and offset keeps changing when i echo a messages in the log file. Can you shed some light on why do u think the way offsets are created and stored will cause an issue? What if i keep other prospector config same ( that deals with closing/scanning files) but only change fields and regexs?

Also is processor feature available today within filebeat to be used. If yes, is there a not operator available in when regex condition so that i can say add fields only if message matches ABC and also doesnt match XYZ ( similar to include_line and exclude_line feature)

I think i found the link on processors. It definitely looks like that it handles my usecase.
https://www.elastic.co/guide/en/beats/filebeat/5.4/configuration-processors.html. This feature looks like is available from v5.4 onwards.
I am going to try it. Thanks for pointing me to right direction.

Processors are available today, but there is no "add_fields" processor.

There will be 3 separate readers. Each will be persisting its read offset to the registry. But only one entry will be stored in the registry file. When you restart Filebeat all three readers will resume from the same persisted offset, but that offset is possibly wrong for two of the three readers. So you either re-read some lines or your miss some lines because they were skipped over.

Thanks for the explanation on readers. Makes sense!
Do you think it makes sense to add_fields processor? Any timeline when it can be expected?

I think add_fields would be useful in some situations. Do you want to open an enhancement request on Github for it?

Will definitely do. Thx!!
I was able to use processors for my usecase although a little clumsy. Having option to add fields will greatly help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.