Encoded data in message field using filebeat filestream input

I'm using filebeat to read in a multiline log. I'm able to get the data into elasticsearch with the multiline event stored into the message field.

Log Sample:

Date: Wed Apr 19 09:57:45 2023

Computer Name: SystemX
User Name: SystemX.User
Project includes 1 folder(s) and 4 file(s).
encrypt mode:
set a password for this encryption:
using a user supplied password
set up a group and master password:
no encrypted with groupinfo
C:\Users\User\Desktop\Test files\File1.txt                                  8b6ccb43dca2040c3cfbcd7bfff0b387d4538c33              15bytes       2023/4/6 19:49:45
C:\Users\User\Desktop\Test files\File2.docx                                 a3dcef559e04628b1c71a1d87d353e070bd5d40a           11853bytes       2023/4/6 19:49:45
C:\Users\User\Desktop\Test files\File3.pptx                                 2ca33d9f81a91d2648971f5a12d03ec0ef9fc408           31579bytes       2023/4/6 19:49:45
C:\Users\User\Desktop\Test files\File4.xlsx                                 f4e15a60f7313fae60b9f05b0dc016ab6c68f031            8426bytes       2023/4/6 19:49:45

Filebeat.yml excerpt:

# ============================== Filebeat inputs ===============================


# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  close_timeout: 5m

  # Unique ID among all inputs, an ID is required.
  id: "WinZip Safe Media"

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
    - "C:\\ProgramData\\WinZip Log Files\\*"
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  prospector.scanner.exclude_files: ['.zip$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #  level: debug
  #  review: 1

    - multiline:
        type: pattern
        pattern: '^Date\:.*'
        negate: true
        match: after

Visualize in Discover:

This issue is that I tried to create an ingest pipeline to parse out the data into custom fields. My grok processor does not match because the data is coming in encoded. I can see this when viewing the data in JSON.

Here is my grok processor match statement:

    "grok": {
      "field": "message",
      "patterns": [
        "(?m).*Date: %{DATA:event_timestamp}\\n\\n.*User Name: .*\\.%{DATA:user_name}\\n\\n.*Project.*encrypt mode:\\n\\n%{DATA:encrypt_algo}\\n\\n.*\\=\\n\\n%{GREEDYDATA:file_list}.*END OF FILE"
      "ignore_failure": true
    "set": {
      "field": "user.name",
      "value": "{{user_name}}",
      "ignore_failure": true

I've never had issues with the log input type in the past, not sure if there is something I'm missing with this filestream input.

I tested the grok statement in Dev Tools Grok Debugger and it works fine.

Hello! It seems like the encoding might not be correctly detected, maybe you could try to explicitly set the encoding (filestream input | Filebeat Reference [8.7] | Elastic) and see if it makes a difference.

I did see that list of encodings but didn't see an example or direction on where that gets set. Is there a setting for the filestream in filebeat.yml ? Normally the documentation shows an example but in this case it does not.

I did try adding under filestream in the filebeat.yml:

encoding: plain

I also tried:

encoding: utf-8

That didn't seem to change anything.

Is the encoding setting nested right under the filestream definition or does it go somewhere else in the filebeat.yml?

It was encoding. The file had a weird encoding (utf-16le-bom). I was able to see what the file encoding was in Notepad++ and through trial and error on placement of the setting, I was able to get it to work. In my case I put "encoding: utf-16le-bom" right under "- type: filestream" in the filebeat.yml.

It would be helpful for others to have an example of this like all the other configuration options for encoding on the filestream page.