Encoded data in message field using filebeat filestream input

Ryan_Clark · April 27, 2023, 7:01am

I'm using filebeat to read in a multiline log. I'm able to get the data into elasticsearch with the multiline event stored into the message field.

Log Sample:

Date: Wed Apr 19 09:57:45 2023

Computer Name: SystemX
User Name: SystemX.User
Project includes 1 folder(s) and 4 file(s).
============================================================================================
encrypt mode:
AS_ENCRYPT_MODE_AES256_SHA2
set a password for this encryption:
using a user supplied password
set up a group and master password:
unencrypted
no encrypted with groupinfo
============================================================================================
C:\Users\User\Desktop\Test files\File1.txt                                  8b6ccb43dca2040c3cfbcd7bfff0b387d4538c33              15bytes       2023/4/6 19:49:45
C:\Users\User\Desktop\Test files\File2.docx                                 a3dcef559e04628b1c71a1d87d353e070bd5d40a           11853bytes       2023/4/6 19:49:45
C:\Users\User\Desktop\Test files\File3.pptx                                 2ca33d9f81a91d2648971f5a12d03ec0ef9fc408           31579bytes       2023/4/6 19:49:45
C:\Users\User\Desktop\Test files\File4.xlsx                                 f4e15a60f7313fae60b9f05b0dc016ab6c68f031            8426bytes       2023/4/6 19:49:45
END OF FILE

Filebeat.yml excerpt:

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  close_timeout: 5m

  # Unique ID among all inputs, an ID is required.
  id: "WinZip Safe Media"

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - "C:\\ProgramData\\WinZip Log Files\\*"
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  prospector.scanner.exclude_files: ['.zip$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1
  parsers:

    - multiline:
        type: pattern
        pattern: '^Date\:.*'
        negate: true
        match: after

Visualize in Discover:

This issue is that I tried to create an ingest pipeline to parse out the data into custom fields. My grok processor does not match because the data is coming in encoded. I can see this when viewing the data in JSON.

Here is my grok processor match statement:

 {
    "grok": {
      "field": "message",
      "patterns": [
        "(?m).*Date: %{DATA:event_timestamp}\\n\\n.*User Name: .*\\.%{DATA:user_name}\\n\\n.*Project.*encrypt mode:\\n\\n%{DATA:encrypt_algo}\\n\\n.*\\=\\n\\n%{GREEDYDATA:file_list}.*END OF FILE"
      ],
      "ignore_failure": true
    }
  },
  {
    "set": {
      "field": "user.name",
      "value": "{{user_name}}",
      "ignore_failure": true
    }
  }
]

I've never had issues with the log input type in the past, not sure if there is something I'm missing with this filestream input.

I tested the grok statement in Dev Tools Grok Debugger and it works fine.

marc.guasch · April 27, 2023, 2:54pm

Hello! It seems like the encoding might not be correctly detected, maybe you could try to explicitly set the encoding (filestream input | Filebeat Reference [8.7] | Elastic) and see if it makes a difference.

Ryan_Clark · April 27, 2023, 5:11pm

I did see that list of encodings but didn't see an example or direction on where that gets set. Is there a setting for the filestream in filebeat.yml ? Normally the documentation shows an example but in this case it does not.

I did try adding under filestream in the filebeat.yml:

encoding: plain

I also tried:

encoding: utf-8

That didn't seem to change anything.

Is the encoding setting nested right under the filestream definition or does it go somewhere else in the filebeat.yml?

Ryan_Clark · May 3, 2023, 8:16pm

It was encoding. The file had a weird encoding (utf-16le-bom). I was able to see what the file encoding was in Notepad++ and through trial and error on placement of the setting, I was able to get it to work. In my case I put "encoding: utf-16le-bom" right under "- type: filestream" in the filebeat.yml.

It would be helpful for others to have an example of this like all the other configuration options for encoding on the filestream page.

system · May 31, 2023, 10:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data sent by Filebeat looks garbled Beats filebeat	10	5514	July 5, 2017
Filebeat - received an event - has different character encoding Beats	21	25145	July 5, 2017
Issues with encoding Beats	29	3716	July 5, 2017
Regarding reading the filebeat output to a file as input to logstash Beats filebeat	8	2080	July 5, 2017
Filebeat Multiline eats up the entire log file Beats filebeat	2	1147	July 5, 2017

Encoded data in message field using filebeat filestream input

Related topics