Filebeat modules vs filestream input

stephenb · September 17, 2023, 12:12am

I am a little confused after the very detailed answer I gave why you did not confirm if you could reproduce the same.

At this point, I am going to assume you could.

In general if you are going to try to recreate module with inputs you will need to look at the ingest pipeline and what else the module is doing you can see this under module directory
in this case.

cd ./module/system/auth/

So first timezone.

This is simple, the modules add the add_locale processors which adds the timezone then the pipeline uses that to adjust the syslog timestamp ... the filestream input does not know anything about that.
You can

So this is a simple fix below.

- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  pipeline: filebeat-8.6.2-system-auth-pipeline

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /Users/sbrown/workspace/sample-data/discuss/syslog-pipeline/fsci-secure.log

  index: 'filebeat-auth-8.6.2-sys-linux'

  tags: "preserve_original_event"

  processors:
  - add_locale: ~

How did I know this I went and looked at that module in detail
from filebeat directory, this is where the definitions of the module is if you want an plain input to work like a module you need to look in these directories and understand what it is doing

cd ./module/system/auth/
cat config/auth.yml

and if you look at the ingest pipeline which if you are going to use "outside" the module you really need to look at and understand.

You can see here the date processor uses the timezone

   {
      "date": {
        "target_field": "@timestamp",
        "formats": [
          "MMM  d HH:mm:ss",
          "MMM dd HH:mm:ss",
          "ISO8601"
        ],
        "timezone": "{{{ event.timezone }}}",
        "if": "ctx.event?.timezone != null",
        "field": "system.auth.timestamp",
        "on_failure": [
          {
            "append": {
              "value": "{{{ _ingest.on_failure_message }}}",
              "field": "error.message"
            }
          }
        ]
      }
    }

Your next question about the message field again you just need to look at the ingest pipeline and follow the logic and / or run _simluate/verbose=true to see what is happening....

When this message is sent through the pipeline it is "fully" parsed and thus there is no message left as it is completely parsed all the pertinent data ends up in fields.

Sep 15 11:56:18 dinf-miro sshd[164528]: Accepted password for myuser from 10.3.1.2 port 39100 ssh2

Go look at the code and you should see...

1st) The message gets renamed as event.original
2nd) There is an initial syslog parsing the leftover event is in _temp.message
3rd) There there are subsequent Groks...

The first tries a bunch of combinations and if nothing matches it puts _temp.message back into message

This grok

{
      "grok": {
        "tag": "grok-specific-messages",
        "field": "_temp.message",
        "ignore_missing": true,
        "patterns": [
          "^%{DATA:system.auth.ssh.event} %{DATA:system.auth.ssh.method} for (invalid user)?%{DATA:user.name} from %{IPORHOST:source.ip} port %{NUMBER:source.port:long} ssh2(: %{GREEDYDATA:system.auth.ssh.signature})?",
          "^%{DATA:system.auth.ssh.event} user %{DATA:user.name} from %{IPORHOST:source.ip}",
          "^Did not receive identification string from %{IPORHOST:system.auth.ssh.dropped_ip}",
          "^%{DATA:user.name} :( %{DATA:system.auth.sudo.error} ;)? TTY=%{DATA:system.auth.sudo.tty} ; PWD=%{DATA:system.auth.sudo.pwd} ; USER=%{DATA:system.auth.sudo.user} ; COMMAND=%{GREEDYDATA:system.auth.sudo.command}",
          "^new group: name=%{DATA:group.name}, GID=%{NUMBER:group.id}",
          "^new user: name=%{DATA:user.name}, UID=%{NUMBER:user.id}, GID=%{NUMBER:group.id}, home=%{DATA:system.auth.useradd.home}, shell=%{DATA:system.auth.useradd.shell}$"
        ],
        "description": "Grok specific auth messages.",
        "on_failure": [
          {
            "rename": {
              "description": "Leave the unmatched content in message.",
              "field": "_temp.message",
              "target_field": "message"
            }
          }
        ]
      }
    }

But this DOES in fact exactly match the 2nd grok exactly so it does not fail and put the _temp.message back into message that is the end of the processing... it is fully parsed and no need for a message fields.

If it Had Not matched it would have tried the next grok to try decode pam messages which yours do not fit the pattern.

etc.

This is why on some of the specific messages there are no leftover message because it is fully consumed.

Also if you look at the bottom of the ingest pipeline you will see...

{
      "remove": {
        "ignore_missing": true,
        "field": "event.original",
        "if": "ctx?.tags == null || !(ctx.tags.contains('preserve_original_event'))",
        "ignore_failure": true
      }
    }

so if you just add a tag to either the module or input that will preserve the event.original

tags: "preserve_original_event"

All your event.original issues are solved, so open up those ingest pipelines and read the code they are executed in order.

(note not every pipeline is guaranteed to have that logic so you need to look)

Hope this helps

Topic		Replies	Views
FIlebeat Output Configuration Error Beats filebeat	26	678	September 9, 2024
Filebeat 6.3.2 syslogs are not logging to elastcisearch Beats filebeat	15	1058	October 30, 2018
Filebeats 6.2.3 system module not sending logs Beats filebeat	21	3846	April 30, 2018
Fitebeat 6.3.1 system module issue Beats filebeat	30	2352	August 24, 2018
Logstash Pipelines for Parsing Questions - No fileset in output Beats filebeat	11	825	May 18, 2020

Filebeat modules vs filestream input

Related topics