Logstash Pipelines for Parsing Questions - No fileset in output

I've tried following https://www.elastic.co/guide/en/logstash/7.6/logstash-config-for-filebeat-modules.html, and create logstash pipelines to parse the filebeat data.

However, even the examples provided in that link don't seem to apply to me. The example config begins with
filter {
if [fileset][module] == "system" {
if [fileset][name] == "auth" {
Well, when I look at my filebeat logs and just dump them direct to stdout or a file, there is no "fileset.module", "fileset.name" or even "fileset" anywhere in the logs, so their example parsing config never matches. I don't understand if their recommended config is wrong or if there is something I need to do still to get my filebeat output to have "fileset" values. On the filebeat side, I have the system module enabled and the output going to logstash (on a custom port not 5044). Other than that, it is the default install of filebeat.

Do you have any ideas on this?
Thanks for any help you can give.

Hi @mgotechlock, could you post here a couple of the filebeat events you dumped to stdout/file? Please make sure to redact any sensitive information in the events before posting.

Thanks,

Shaunak

{
         "input" => {
        "type" => "log"
    },
         "cloud" => {
        "instance" => {
            "id" => "156963326"
        },
        "provider" => "digitalocean",
          "region" => "nyc3"
    },
      "@version" => "1",
    "@timestamp" => 2020-04-17T18:08:02.238Z,
          "host" => {
                 "name" => "oompaloompa",
         "architecture" => "x86_64",
             "hostname" => "oompaloompa",
                   "os" => {
                "name" => "Ubuntu",
            "codename" => "bionic",
            "platform" => "ubuntu",
              "family" => "debian",
              "kernel" => "4.15.0-96-generic",
             "version" => "18.04.4 LTS (Bionic Beaver)"
        },
        "containerized" => false,
                   "id" => "fc61c6cc61c1434fbf7d14b4fbff55f6"
    },
           "ecs" => {
        "version" => "1.4.0"
    },
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
       "message" => "Apr 14 21:24:08 oompaloompa sshd[6934]: Invalid user redis1 from 203.159.249.215 port 58702",
         "agent" => {
        "ephemeral_id" => "d9f4cc34-072e-4b08-bd19-247c9114c9e5",
             "version" => "7.6.2",
                "type" => "filebeat",
            "hostname" => "oompaloompa",
                  "id" => "469f2965-149e-4064-9e0c-2cd0669728b5"
    },
           "log" => {
          "file" => {
            "path" => "/var/log/auth.log"
        },
        "offset" => 943508
    }
}
         "input" => {
        "type" => "log"
    },
         "cloud" => {
        "instance" => {
            "id" => "156963326"
        },
          "region" => "nyc3",
        "provider" => "digitalocean"
    },
      "@version" => "1",
    "@timestamp" => 2020-04-17T18:08:02.238Z,
          "host" => {
                 "name" => "oompaloompa",
         "architecture" => "x86_64",
             "hostname" => "oompaloompa",
                   "os" => {
                "name" => "Ubuntu",
            "codename" => "bionic",
              "family" => "debian",
            "platform" => "ubuntu",
              "kernel" => "4.15.0-96-generic",
             "version" => "18.04.4 LTS (Bionic Beaver)"
        },
        "containerized" => false,
                   "id" => "fc61c6cc61c1434fbf7d14b4fbff55f6"
    },
           "ecs" => {
        "version" => "1.4.0"
    },
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
       "message" => "Apr 14 21:24:08 oompaloompa sshd[6934]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.159.249.215",
         "agent" => {
        "ephemeral_id" => "d9f4cc34-072e-4b08-bd19-247c9114c9e5",
             "version" => "7.6.2",
                "type" => "filebeat",
            "hostname" => "oompaloompa",
                  "id" => "469f2965-149e-4064-9e0c-2cd0669728b5"
    },
           "log" => {
        "offset" => 943686,
          "file" => {
            "path" => "/var/log/auth.log"
        }
    }
}

Hmm, that's strange. I just tried to reproduce this with Filebeat 7.6.0 with the system module enabled and I'm seeing an event field in the events, which contains sub-fields like module and dataset.

Could you please post the result of the following command (again, after redacting any sensitive information)?

filebeat export config

Thanks,

Shaunak

So I made progress. Apparently, you are not supposed to have inputs enabled in filebeat.yml AND the modules enabled. Who knew? The documentation is terrible. Once I disabled the inputs from filebeat.yml, I see data in a better format, but still insufficient to meet the config Elastic publishes in the original URL.
I do see fileset.name=auth and event.module=system, but the example config is fileset.module=system. I can easily change the config but i would like confirmation that the example config Elastic publishes is incorrect, so I can be sure I am not doing anything wrong.

It's possible to mix "raw" Filebeat inputs in your configuration with modules. Imagine a case where you have logs from a well-known service like Apache or system logs but also have logs from your own application. You could ingest all of these logs with a single Filebeat instance by enabling the apache and system modules but also specifying your own "raw" inputs in the Filebeat configuration.

BTW, modules start their own inputs under the hood so, at the end of the day, there are inputs configured anyway!

You're right — the configurations shown in that documentation are outdated. So sorry about that and thank you for bringing it to our attention. I've created a PR now to fix that documentation: Updating fields to new ECS names by ycombinator · Pull Request #11807 · elastic/logstash · GitHub.

As I've done in the PR, I'd suggest using [event][module] in place of [fileset][module] and [event][dataset] instead of [fileset][name]. Note that the event.dataset field contains the fully-qualified name of the dataset (aka fileset), so it includes the module name as a prefix, e.g. system.auth.

Awesome. Thanks for your assist. I understand it now. If i could ask one more question.
Just taking the [auth] from system pipeline, I've enabled it and are getting logs but some are not being parsed properly, while some are. I believe it is because the unparsed ones do not match one of the 7 "match" statements in the example config. I can obviously add more but I just wanted confirmation that that is to be expected and that I should not expect the example config to catch every possible entry? And if that is true, any idea how many "match" statements I might end up needing to create ?

FYI, if you do change the example to event.dataset, change the value to system.auth.
My logs are still showing fileset.name = auth works, so I think either is fine, though I admittedly only looking at ubuntu at the moment.

Would you mind creating a new topic for this, since this topic here is already marked as solved? It just keeps the forums clean and easily searchable for anyone else running into similar problems. Thanks!

Indeed, that's what I meant by:

:slight_smile:

Yes definitely, either can work, at least for now. The reason I prefer to use event.dataset is that it's a core field in ECS and, as such, more future proof.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.