Filebeat modules vs filestream input

Hello,

I'm trying to configure filebeat to read a Linux system and auth log file. when I'm using datastream input, the data isn't parsed well; everything is let into the message field without any processing.

When I use the "system" module of filebeat, I get the data well parsed. So after looking at the JSON metada output from my logstash server, I noticed there was no value for the target_pipeline:

            "@metadata" => {
                "version" => "8.6.2",
                   "beat" => "filebeat",
                    "now" => 2023-09-11T23:38:24.000Z,
             "index_time" => "2023.09.11",
           "target_index" => "filebeat-fsci-8.6.2-sys-linux",
             "ip_address" => "10.3.1.3",
              "raw_index" => "filebeat-fsci-8.6.2-sys-linux",
        "target_pipeline" => "",
                   "type" => "_doc"

But when I use the system module instead of the filestram input, I have a value for target_pipeline:

            "@metadata" => {
                "version" => "8.6.2",
                   "beat" => "filebeat",
                    "now" => 2023-09-11T23:37:28.000Z,
             "index_time" => "2023.09.11",
               "pipeline" => "filebeat-8.6.2-system-syslog-pipeline",
           "target_index" => "filebeat-fsci-8.6.2-sys-linux",
             "ip_address" => "10.3.1.3",
              "raw_index" => "filebeat-fsci-8.6.2-sys-linux",
        "target_pipeline" => "filebeat-8.6.2-system-syslog-pipeline",
                   "type" => "_doc"

So I suspect this is why the data is correctly parsed when using the system module rather than the filestream input.

But after looking at the doc for the filestream input, I saw there was a parameter pipeline I can use. So I added this to my filebeat.yml:

pipeline: filebeat-8.6.2-system-syslog-pipeline

From the logstash output, I can now the a value for target_pipeline:

            "@metadata" => {
                "version" => "8.6.2",
                   "beat" => "filebeat",
           "target_index" => "filebeat-fsci-8.6.2-sys-linux",
        "target_pipeline" => "filebeat-8.6.2-system-syslog-pipeline",
              "raw_index" => "filebeat-fsci-8.6.2-sys-linux",
             "index_time" => "2023.09.12",
             "ip_address" => "10.3.1.3",
                   "type" => "_doc",
                    "now" => 2023-09-12T15:43:24.000Z,
               "pipeline" => "filebeat-8.6.2-system-syslog-pipeline"
    },

But the result into Kibana is not well parsed as it is with the system module.

I would like to use the filestream inside the filebeat.yml because I have several options to configure for each log file. Here is an example:

# FSCI LOGS
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  pipeline: filebeat-8.6.2-system-syslog-pipeline

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/facultes/FSCI/fsci-secure.log
    - /var/log/facultes/FSCI/fsci-messages.log

  tags: ["FSCI"]
  index: 'filebeat-fsci-8.6.2-sys-linux'

This is the configuration of the filestream for FSCI logs. I need to specify different value for paths, tags and index.

Here, is the configuration for FMSS logs:

# FMSSLOGS
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  pipeline: filebeat-8.6.2-system-syslog-pipeline

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/facultes/FMSS/fmss-secure.log
    - /var/log/facultes/FMSS/fmss-messages.log

  tags: ["FMSS"]
  index: 'filebeat-fmss-8.6.2-sys-linux'

I want to be able to configure the value for some parameters, and then, into Kibana, allow specified users to access their own data based on the index name. But I would like to have those data correctly parsed using the ingest pipeline, like the module system does.

Here is the Kibana output using filestream:

And here the Kibana output using system module:

As you can see there are lot more information when using the module versus using the filestream input.

Maybe it's a bit unclear what I want to achieve here, so if you have any questions, please feel free to ask!

Thank you all and Best Regards,
Yanick

Hi @yquirion

Can you share the output section of your logstash configuration?

Also if the module works why not just use it?

Hi @stephenb,

Thanks for your reply.

The reason why I'm not using the module is because I have severals source file with specific parameters.

Using datastream, I can specify them like this:

Logs from FSCI faculty
- type: filestream
  paths:
    - /var/log/facultes/FSCI/fsci-secure.log
    - /var/log/facultes/FSCI/fsci-messages.log
  tags: ["FSCI"]
  index: 'filebeat-fsci-8.6.2-sys-linux'

Logs from FMSS faculty
- type: filestream
  paths:
    - /var/log/facultes/FMSS/fmss-secure.log
    - /var/log/facultes/FMSS/fmss-messages.log
  tags: ["FMSS"]
  index: 'filebeat-fmss-8.6.2-sys-linux'

I have several faculty/department that I need to send into different index (filebeat-fsci-8.6.2-sys-linux vs filebeat-fmss-8.6.2-sys-linux).

From the module system, I don't think I have that capability specifying different index name/tags as I can do in the filebeat.yml config file.

I've configures the system module like this:

# Module: system
# Docs: https://www.elastic.co/guide/en/beats/filebeat/8.6/filebeat-module-system.html

- module: system
  # Syslog
  syslog:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths: 
      - /var/log/facultes/FSCI/fsci-messages.log

  # Authorization logs
  auth:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths:
      - /var/log/facultes/FSCI/fsci-secure.log

Based on the documentation, there is no tags or index parameter available into a module: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-module-system.html

In the system module conf file (modules.d/system.yml), I could add my second faculty log file like this:

  # Syslog
  syslog:
    var.paths: 
      - /var/log/facultes/FSCI/fsci-messages.log
      - /var/log/facultes/FMSS/fmss-messages.log

  # Authorization logs
  auth:
    var.paths:
      - /var/log/facultes/FSCI/fsci-secure.log
      - /var/log/facultes/FMSS/fmss-secure.log

But doing this I will need to set the index name into my logstash filter. Yes I can do it, but if there is a better way to achieve this, I wouls avoid setting up a logstash filter for this.

Regarding my logstash configuration, I do not have anything configured when using datastream input. Same for when I'm using the system module. The only reason why I do need a filter when using the system module is to set the specific parameter:

  • tags
  • index

Here is my logstash code that will set the index name:

  ####################
  # LOG DES FACULTES #
  ####################
  if [host][name] == "syslogf.sti.usherbrooke.ca" {
    if ( [log][file][path] =~ /FSCI/ ) {
      mutate {
        add_field => { "[@metadata][raw_index]" => "%{[@metadata][beat]}-fsci-%{[@metadata][version]}-sys-linux" }
      }
    } else if [log][file][path] =~ /FMSS/ ) {
        add_field => { "[@metadata][raw_index]" => "%{[@metadata][beat]}-fmss-%{[@metadata][version]}-sys-linux" }
    } else {
      mutate {
        add_field => { "[@metadata][raw_index]" => "%{[@metadata][beat]}-kago-%{[@metadata][version]}-sys-linux" }
      }      
    }
  }

But I don't like playing with @metadata fields like I'm doing here. Also it will be more easier to maintain everything from the filebeat.yml file rather having to modify the pipeline each time a new faculty or department wants to use our ELK platform.

Hope this help understanding my needs!

Regards,
Yanick

Understood on the modules vs inputs... there probably is a to setup indices and tags but lets skip that for now... what you are trying to do should work

However, you did not share your output section in your logstash.conf that is a very important part of this conversation.

On the logic if you set tags in filebeat then you can use that in logstash to do the conditional logic above probably in a cleaner way... and not sure why you are using "raw_index"

In general, it is best to share the whole conf if you can... piecemeal leads to assumptions.

So perhaps share your whole logstash.conf

And just for clarification you are referring to a filestream input which is different than a datastream which is an abstraction of time series data / indices in Elasticsearch I fixed the title

Hi @stephenb,

I'm sorry I forgot to add this to my previous reply. Here is the my .conf pipeline in my logstash server:

input {
  beats {
    port => 5554
    ssl => false
    tags => [ "fileoutput" ]
  }
}

filter {
  ####################
  # LOG DES FACULTES #
  ####################
  # if [host][name] == "syslogf.sti.usherbrooke.ca" {
  #   #if ( [container][image][version] =~ /v1.*/ ) {
  #   if ( [log][file][path] =~ /FSCI/ ) {
  #     mutate {
  #       add_field => { "[@metadata][raw_index]" => "%{[@metadata][beat]}-fsci-%{[@metadata][version]}-sys-linux" }
  #     }
  #   } else {
  #     mutate {
  #       add_field => { "[@metadata][raw_index]" => "%{[@metadata][beat]}-kago-%{[@metadata][version]}-sys-linux" }
  #     }      
  #   }
  # }

  # Get current date for index name with EST date
  mutate { add_field => { "[@metadata][now]" => "%{+YYYY.MM.dd HH:mm:ss}" } }

  # Common configs for supporting modules or inputs
  # Set index name ("raw_index" for filebeat inputs, "beat" for filebeat modules)
  if [@metadata][raw_index] {
    mutate { copy => { "[@metadata][raw_index]" => "[@metadata][target_index]" } }
  } else {
    mutate { add_field => { "[@metadata][target_index]" => "%{[@metadata][beat]}" } }
  }

  # Set pipeline name (set an empty string if not defined...)
  if [@metadata][pipeline] {
    mutate { add_field => { "[@metadata][target_pipeline]" => "%{[@metadata][pipeline]}" } }
  } else {
    mutate { add_field => { "[@metadata][target_pipeline]" => "" } }
  }

  date {
    match => [ "[@metadata][now]", "YYYY.MM.dd HH:mm:ss" ]
    target => "[@metadata][now]"
    timezone => "UTC"
  }

  ruby {
    init => "
      require 'tzinfo'
      require 'time'
    "
    code => "
      tz = TZInfo::Timezone.get('America/Montreal')
      current_time = Time.parse(event.get('[@metadata][now]').utc.to_s)
      event.set('[@metadata][index_time]', tz.to_local(current_time).strftime('%Y.%m.%d').to_s)
    "
  }
}

output {
  # Add a tag "fileoutput" to the filebeat source to send to a file (for tests or debuging) or "fileonly" to avoid sending to the cluster ELK
  if "fileoutput" in [tags] or "fileonly" in [tags] {
    file {
      path => "/data/logstash/%{[@metadata][target_index]}-%{+YYYY.MM.dd}"
      codec => rubydebug {
        metadata => true
      }
    }
  }

  if "fileonly" not in [tags] {
    elasticsearch {
      hosts => [ "${coordinator_node_0}", "${coordinator_node_1}" ]
      ssl => true
      cacert => "/etc/logstash/certs/UdeS-Chain-Base64.crt"
      sniffing => false

      manage_template => false

      #index => "%{[@metadata][target_index]}-%{+YYYY.MM.dd}"
      index => "%{[@metadata][target_index]}-%{[@metadata][index_time]}"
      pipeline => "%{[@metadata][target_pipeline]}"

      user => "logstash_user"
      password => "${logstash_user_pwd}"
    }
  }
}

From section "LOG DES FACULTES", which has been commented out, it's because I'm using the filestream at this file from my filebeat.yml and the module system has been disabled.

As you can see into the output section, we build the index name that way:

index => "%{[@metadata][target_index]}-%{[@metadata][index_time]}"

Same with the pipeline:

pipeline => "%{[@metadata][target_pipeline]}"

Those parameters has been sets into the filter section based on this information available from the beat.

Thanks again for your help! I appreciate it!

Regards,
Yanick

@yquirion take a look at this

I see this in yours...

should be

pipeline => "%{[@metadata][pipeline]}"

Which gets automatically when you set a pipeline in beats

Also, you do not need to set the index the way you are but if that is working it is fine

Also did you just try to "hard code" the pipeline to see if you are actually executing one?

Hi @stephenb,

In the filter section I have the following:

  # Set pipeline name (set an empty string if not defined...)
  if [@metadata][pipeline] {
    mutate { add_field => { "[@metadata][target_pipeline]" => "%{[@metadata][pipeline]}" } }
  } else {
    mutate { add_field => { "[@metadata][target_pipeline]" => "" } }
  }

So there will always be a value into pipeline => "%{[@metadata][target_pipeline]}" except if [@metadata][pipeline] doesn't exists.

Do be honest with you I don't remember why we did that when we starts using this product, but I think we had some kind of issues. The guy who makes this is now retired, and I never change it because it is working fine.

Looking at a JSON output when using filestream, we can see the value is correctly set from what I put in the filebeat.yml config file:

{
                "input" => {
        "type" => "filestream"
    },
                "agent" => {
                "name" => "syslogf.sti.domain.com",
                  "id" => "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
        "ephemeral_id" => "acd32a49-6544-4c1c-b57a-97acb3c77f4d",
                "type" => "filebeat",
             "version" => "8.6.2"
    },
           "@timestamp" => 2023-09-12T15:31:34.270Z,
                  "ecs" => {
        "version" => "8.0.0"
    },
                  "log" => {
          "file" => {
            "path" => "/var/log/facultes/FSCI/fsci-secure.log"
        },
        "offset" => 21099
    },
            "@metadata" => {
                "version" => "8.6.2",
                   "beat" => "filebeat",
           "target_index" => "filebeat-fsci-8.6.2-sys-linux",
        "target_pipeline" => "filebeat-8.6.2-system-syslog-pipeline",
              "raw_index" => "filebeat-fsci-8.6.2-sys-linux",
             "index_time" => "2023.09.12",
             "ip_address" => "10.32.14.38",
                   "type" => "_doc",
                    "now" => 2023-09-12T15:31:34.000Z,
               "pipeline" => "filebeat-8.6.2-system-syslog-pipeline"
    },
             "@version" => "1",
                 "host" => {
        "name" => "syslogf.sti.domain.com"
    },
              "message" => "Sep 12 11:31:28 dinf-miro sshd[13886]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=myuser",
                 "tags" => [
        [0] "audit",
        [1] "syslog",
        [2] "FSCI",
        [4] "fileoutput",
        [5] "beats_input_codec_plain_applied"
    ]
}

As you can see the fields "@metadata" => { "pipeline" => "filebeat-8.6.2-system-syslog-pipeline" } has the same value of "@metadata" => { "target_pipeline" => "filebeat-8.6.2-system-syslog-pipeline" }

The tags also contains "FSCI" and the index name has also the good value:

    "tags" => [
        [0] "audit",
        [1] "syslog",
        [2] "FSCI",
        [4] "fileoutput",
        [5] "beats_input_codec_plain_applied"
    ]

"target_index" => "filebeat-fsci-8.6.2-sys-linux"

I just don't understand why the ingest pipeline is not doing its jobs correctly when not using the system module.

To help you understanding, here is two JSON output from my logstash server. The first one is using filestream, and the second one is using system module. Both output are for the same kind of event (authentication success).

filestream:

{
                "input" => {
        "type" => "filestream"
    },
                "agent" => {
                "name" => "syslogf.domain.com",
                  "id" => "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
        "ephemeral_id" => "acd32a49-6544-4c1c-b57a-97acb3c77f4d",
                "type" => "filebeat",
             "version" => "8.6.2"
    },
           "@timestamp" => 2023-09-12T15:31:34.270Z,
                  "ecs" => {
        "version" => "8.0.0"
    },
                  "log" => {
          "file" => {
            "path" => "/var/log/facultes/FSCI/fsci-secure.log"
        },
        "offset" => 21099
    },
            "@metadata" => {
                "version" => "8.6.2",
                   "beat" => "filebeat",
           "target_index" => "filebeat-fsci-8.6.2-sys-linux",
        "target_pipeline" => "filebeat-8.6.2-system-syslog-pipeline",
              "raw_index" => "filebeat-fsci-8.6.2-sys-linux",
             "index_time" => "2023.09.12",
             "ip_address" => "10.32.14.38",
                   "type" => "_doc",
                    "now" => 2023-09-12T15:31:34.000Z,
               "pipeline" => "filebeat-8.6.2-system-syslog-pipeline"
    },
             "@version" => "1",
                 "host" => {
        "name" => "syslogf.domain.com"
    },
              "message" => "Sep 12 11:31:28 dinf-miro sshd[13886]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
                 "tags" => [
        [0] "audit",
        [1] "syslog",
        [2] "FSCI",
        [3] "grostata",
        [4] "fileoutput",
        [5] "beats_input_codec_plain_applied"
    ]
}

system module:

{
                "agent" => {
                "name" => "syslogf.domain.com",
                  "id" => "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type" => "filebeat",
        "ephemeral_id" => "4bfe037d-7791-4ced-98c7-302b1d4b2433",
             "version" => "8.6.2"
    },
                  "log" => {
          "file" => {
            "path" => "/var/log/facultes/FSCI/fsci-secure.log"
        },
        "offset" => 7987
    },
            "@metadata" => {
                "version" => "8.6.2",
               "pipeline" => "filebeat-8.6.2-system-auth-pipeline",
           "target_index" => "filebeat-fsci-8.6.2-sys-linux",
        "target_pipeline" => "filebeat-8.6.2-system-auth-pipeline",
              "raw_index" => "filebeat-fsci-8.6.2-sys-linux",
             "index_time" => "2023.09.11",
             "ip_address" => "10.32.14.38",
                   "type" => "_doc",
                    "now" => 2023-09-12T00:11:52.000Z,
                   "beat" => "filebeat"
    },
              "fileset" => {
        "name" => "auth"
    },
              "message" => "Sep 11 20:11:50 dinf-miro sshd[2838516]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
                 "tags" => [
        [0] "audit",
        [1] "syslog",
        [2] "fileoutput",
        [3] "beats_input_codec_plain_applied"
    ],
                "input" => {
        "type" => "log"
    },
           "@timestamp" => 2023-09-12T00:11:52.860Z,
                  "ecs" => {
        "version" => "8.0.0"
    },
              "service" => {
        "type" => "system"
    },
                 "host" => {
        "name" => "syslogf.domain.com"
    },
             "@version" => "1",
                "event" => {
        "timezone" => "-04:00",
          "module" => "system",
         "dataset" => "system.auth"
    }
}

Right now, I'm using the system module. To make sure the index name is fine, I enabled this from my logstash pipeline:

      if ( [log][file][path] =~ /FSCI/ ) {
        mutate {
          add_field => { "[@metadata][raw_index]" => "%{[@metadata][beat]}-fsci-%{[@metadata][version]}-sys-linux" }
        }
      } else if ( [log][file][path] =~ /FMSS/ ) {
        mutate {
          add_field => { "[@metadata][raw_index]" => "%{[@metadata][beat]}-fmss-%{[@metadata][version]}-sys-linux" }
        }      
      }

I'm looking at field [log][file][path] to dertermine the index name. When using system module, field [@metadata][raw_index] is emply, so I add the new field using mutate -> add_field.

P.S. I wrote this reply yesterday, but will all other duties I have, I never press the "reply" button! :rofl:

Regards,
Yanick

Hello again @stephenb,

I tried your suggestion of "hardcoding" the pipeline parameter into the output section of my LS pipeline:

pipeline => "filebeat-8.6.2-system-auth-pipeline"

However, I got the same result as when I'm using this:

pipeline => "%{[@metadata][target_pipeline]}"

JSON output of a record:

{
                "input" => {
        "type" => "filestream"
    },
                "agent" => {
                "name" => "syslogf.sti.usherbrooke.ca",
                  "id" => "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type" => "filebeat",
        "ephemeral_id" => "7e18fc20-1016-4ac4-a817-99473319966d",
             "version" => "8.6.2"
    },
           "@timestamp" => 2023-09-14T11:56:37.000Z,
                  "ecs" => {
        "version" => "8.0.0"
    },
                  "log" => {
        "syslog" => {
            "hostname" => "dinf-miro",
             "appname" => "sshd",
              "procid" => "107572"
        }
    },
            "@metadata" => {
                "version" => "8.6.2",
             "index_time" => "2023.09.14",
               "pipeline" => "filebeat-8.6.2-system-syslog-pipeline",
             "ip_address" => "10.3.1.3",
              "raw_index" => "filebeat-fsci-8.6.2-sys-linux",
                   "beat" => "filebeat",
                   "type" => "_doc",
                    "now" => 2023-09-14T11:56:37.000Z,
           "target_index" => "filebeat-fsci-8.6.2-sys-linux",
        "target_pipeline" => "filebeat-8.6.2-system-syslog-pipeline"
    },
             "@version" => "1",
                 "host" => {
        "name" => "syslogf.domain.com"
    },
              "message" => "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto",
                 "tags" => [
        [0] "audit",
        [1] "syslog",
        [2] "FSCI",
        [3] "fileoutput",
        [4] "beats_input_codec_plain_applied"
    ]
}

Thanks,
Yanick

find that document in Discover and use the JSON and use the _simulate with verbose=true API to see what the pipeline is and is not doing... I suspect it is not processing the message correctly

Post what you see

Can you share some of the raw log lines...

It does not appear they are in syslog format and thus the pipeline is failing... see all this below.

So the problem is that the Syslog part of this has already been stripped off or was never added which is not what the pipeline is expecting

If you Get the pipeline and look at the Grok you you will see it is expecting the syslogs Timestap host process etc..etc.. your message has none of that

In Discover you will see and error.message something like

                "message": "Provided Grok expressions do not match field value: [pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto]""

Here is the GROK

    {
      "grok": {
        "patterns": [
          "%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{GREEDYMULTILINE:system.syslog.message}",
          "%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{GREEDYMULTILINE:system.syslog.message}",
          "%{TIMESTAMP_ISO8601:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{GREEDYMULTILINE:system.syslog.message}"
        ],
        "pattern_definitions": {
          "GREEDYMULTILINE": "(.|\n)*"
        },
        "ignore_missing": true,
        "field": "message"
      }
    }

so to me, it looks like those log lines are not in syslog format

Here is a sample _simulate of the pipeline and the result

POST _ingest/pipeline/filebeat-8.6.2-system-syslog-pipeline/_simulate?verbose=true
{
  "docs": [
    {
      "_source": {
        "input": {
          "type": "filestream"
        },
        "agent": {
          "name": "syslogf.sti.usherbrooke.ca",
          "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
          "type": "filebeat",
          "ephemeral_id": "7e18fc20-1016-4ac4-a817-99473319966d",
          "version": "8.6.2"
        },
        "@timestamp": "2023-09-14T11: 56: 37.000Z",
        "ecs": {
          "version": "8.0.0"
        },
        "log": {
          "syslog": {
            "hostname": "dinf-miro",
            "appname": "sshd",
            "procid": "107572"
          }
        },
        "@metadata": {
          "version": "8.6.2",
          "index_time": "2023.09.14",
          "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
          "ip_address": "10.3.1.3",
          "raw_index": "filebeat-fsci-8.6.2-sys-linux",
          "beat": "filebeat",
          "type": "_doc",
          "now": "2023-09-14T11: 56: 37.000Z",
          "target_index": "filebeat-fsci-8.6.2-sys-linux",
          "target_pipeline": "filebeat-8.6.2-system-syslog-pipeline"
        },
        "@version": "1",
        "host": {
          "name": "syslogf.domain.com"
        },
        "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto",
        "tags": [
          "audit",
          "syslog",
          "FSCI",
          "fileoutput",
          "beats_input_codec_plain_applied"
        ]
      }
    }
  ]
}

#Results

{
  "docs": [
    {
      "processor_results": [
        {
          "processor_type": "set",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_version": "-3",
            "_id": "_id",
            "_source": {
              "input": {
                "type": "filestream"
              },
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "7e18fc20-1016-4ac4-a817-99473319966d",
                "version": "8.6.2"
              },
              "@timestamp": "2023-09-14T11: 56: 37.000Z",
              "ecs": {
                "version": "8.0.0"
              },
              "log": {
                "syslog": {
                  "procid": "107572",
                  "hostname": "dinf-miro",
                  "appname": "sshd"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "target_index": "filebeat-fsci-8.6.2-sys-linux",
                "target_pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "index_time": "2023.09.14",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "now": "2023-09-14T11: 56: 37.000Z",
                "beat": "filebeat",
                "ip_address": "10.3.1.3",
                "type": "_doc",
                "version": "8.6.2"
              },
              "@version": "1",
              "host": {
                "name": "syslogf.domain.com"
              },
              "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto",
              "event": {
                "ingested": "2023-09-14T18:10:35.213761622Z"
              },
              "tags": [
                "audit",
                "syslog",
                "FSCI",
                "fileoutput",
                "beats_input_codec_plain_applied"
              ]
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T18:10:35.213761622Z"
            }
          }
        },
        {
          "processor_type": "grok", <!--------------- HERE
          "status": "error", <!-------- HERE
          "error": {
            "root_cause": [
              {
                "type": "illegal_argument_exception",
                "reason": "Provided Grok expressions do not match field value: [pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto]"
              }
            ],
            "type": "illegal_argument_exception",
            "reason": "Provided Grok expressions do not match field value: [pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto]"
          }
        },
        {
          "processor_type": "set",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_version": "-3",
            "_id": "_id",
            "_source": {
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "7e18fc20-1016-4ac4-a817-99473319966d",
                "version": "8.6.2"
              },
              "log": {
                "syslog": {
                  "procid": "107572",
                  "hostname": "dinf-miro",
                  "appname": "sshd"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "target_index": "filebeat-fsci-8.6.2-sys-linux",
                "target_pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "index_time": "2023.09.14",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "now": "2023-09-14T11: 56: 37.000Z",
                "beat": "filebeat",
                "ip_address": "10.3.1.3",
                "type": "_doc",
                "version": "8.6.2"
              },
              "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto",
              "error": {
                "message": "Provided Grok expressions do not match field value: [pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto]"
              },
              "tags": [
                "audit",
                "syslog",
                "FSCI",
                "fileoutput",
                "beats_input_codec_plain_applied"
              ],
              "input": {
                "type": "filestream"
              },
              "@timestamp": "2023-09-14T11: 56: 37.000Z",
              "ecs": {
                "version": "8.0.0"
              },
              "@version": "1",
              "host": {
                "name": "syslogf.domain.com"
              },
              "event": {
                "ingested": "2023-09-14T18:10:35.213761622Z"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "on_failure_pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "on_failure_message": "Provided Grok expressions do not match field value: [pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto]",
              "on_failure_processor_tag": null,
              "timestamp": "2023-09-14T18:10:35.213761622Z",
              "on_failure_processor_type": "grok"
            }
          }
        }
      ]
    }
  ]
}

Hi @stephenb,

Here is a raw log line directly from the file the filebeat will read and send it to the logstash:

Sep 14 07:56:37 dinf-miro sshd[107572]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=someuser

I not sure how to run it using the ingest pipeline. Will look at the doc you sent me.

Thanks!
Yanick

I am a little confused ...

Some messages above show

"message" => "Sep 12 11:31:28 dinf-miro sshd[13886]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=myuser",
       

Some show the message which are already half parsed

   "host" => {
        "name" => "syslogf.domain.com"
    },
              "message" => "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto",
                 "tags" => [

and then the raw

Sep 14 07:56:37 dinf-miro sshd[107572]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=someuser

So you can see with this run of the pipeline with each one the first fails and the other 2 full syslog messages parse correctly so why / what is already doing some of the parsing???

POST _ingest/pipeline/filebeat-8.6.2-system-syslog-pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto"
      }
    },
    {
      "_source": {
              "message" : "Sep 12 11:31:28 dinf-miro sshd[13886]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=myuser"
      }
    },
    {
      "_source": {
              "message" : "Sep 14 07:56:37 dinf-miro sshd[107572]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=someuser"
      }
    }
  ]
}

# Result

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto",
          "event": {
            "ingested": "2023-09-14T20:57:54.932933914Z"
          },
          "error": {
            "message": "Provided Grok expressions do not match field value: [pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto]"
          }
        },
        "_ingest": {
          "timestamp": "2023-09-14T20:57:54.932933914Z"
        }
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "process": {
            "name": "sshd",
            "pid": 13886
          },
          "system": {
            "syslog": {}
          },
          "@timestamp": "2023-09-12T11:31:28.000Z",
          "related": {
            "hosts": [
              "dinf-miro"
            ]
          },
          "host": {
            "hostname": "dinf-miro"
          },
          "event": {
            "ingested": "2023-09-14T20:57:54.932958581Z",
            "kind": "event"
          },
          "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=myuser"
        },
        "_ingest": {
          "timestamp": "2023-09-14T20:57:54.932958581Z"
        }
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "process": {
            "name": "sshd",
            "pid": 107572
          },
          "system": {
            "syslog": {}
          },
          "@timestamp": "2023-09-14T07:56:37.000Z",
          "related": {
            "hosts": [
              "dinf-miro"
            ]
          },
          "host": {
            "hostname": "dinf-miro"
          },
          "event": {
            "ingested": "2023-09-14T20:57:54.932964779Z",
            "kind": "event"
          },
          "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=someuser"
        },
        "_ingest": {
          "timestamp": "2023-09-14T20:57:54.932964779Z"
        }
      }
    }
  ]
}

Hi @stephenb,

I managed how to parse date using the ingest simulation.

I have this raw line:

Sep 14 07:56:37 dinf-miro sshd[107572]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=someuser

I called the API like this:

POST _ingest/pipeline/filebeat-8.6.2-system-syslog-pipeline/_simulate?verbose=true
{
  "docs": [
    {
      "_source": {
        "message": "Sep 14 07:56:37 dinf-miro sshd[107572]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser"
      }
    }
  ]
}

Here is the result:

{
  "docs": [
    {
      "processor_results": [
        {
          "processor_type": "set",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "message": "Sep 14 07:56:37 dinf-miro sshd[107572]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser",
              "event": {
                "ingested": "2023-09-14T20:50:04.699887827Z"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T20:50:04.699887827Z"
            }
          }
        },
        {
          "processor_type": "grok",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "host": {
                "hostname": "dinf-miro"
              },
              "process": {
                "name": "sshd",
                "pid": 107572
              },
              "system": {
                "syslog": {
                  "message": "pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser",
                  "timestamp": "Sep 14 07:56:37"
                }
              },
              "message": "Sep 14 07:56:37 dinf-miro sshd[107572]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser",
              "event": {
                "ingested": "2023-09-14T20:50:04.699887827Z"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T20:50:04.699887827Z"
            }
          }
        },
        {
          "processor_type": "remove",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "host": {
                "hostname": "dinf-miro"
              },
              "process": {
                "name": "sshd",
                "pid": 107572
              },
              "system": {
                "syslog": {
                  "message": "pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser",
                  "timestamp": "Sep 14 07:56:37"
                }
              },
              "event": {
                "ingested": "2023-09-14T20:50:04.699887827Z"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T20:50:04.699887827Z"
            }
          }
        },
        {
          "processor_type": "rename",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "host": {
                "hostname": "dinf-miro"
              },
              "process": {
                "name": "sshd",
                "pid": 107572
              },
              "system": {
                "syslog": {
                  "timestamp": "Sep 14 07:56:37"
                }
              },
              "event": {
                "ingested": "2023-09-14T20:50:04.699887827Z"
              },
              "message": "pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser"
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T20:50:04.699887827Z"
            }
          }
        },
        {
          "processor_type": "date",
          "status": "success",
          "if": {
            "condition": "ctx.event.timezone == null",
            "result": true
          },
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "process": {
                "name": "sshd",
                "pid": 107572
              },
              "system": {
                "syslog": {
                  "timestamp": "Sep 14 07:56:37"
                }
              },
              "@timestamp": "2023-09-14T07:56:37.000Z",
              "host": {
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T20:50:04.699887827Z"
              },
              "message": "pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser"
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T20:50:04.699887827Z"
            }
          }
        },
        {
          "processor_type": "date",
          "status": "skipped",
          "if": {
            "condition": "ctx.event.timezone != null",
            "result": false
          }
        },
        {
          "processor_type": "remove",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "process": {
                "name": "sshd",
                "pid": 107572
              },
              "system": {
                "syslog": {}
              },
              "@timestamp": "2023-09-14T07:56:37.000Z",
              "host": {
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T20:50:04.699887827Z"
              },
              "message": "pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser"
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T20:50:04.699887827Z"
            }
          }
        },
        {
          "processor_type": "set",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "process": {
                "name": "sshd",
                "pid": 107572
              },
              "system": {
                "syslog": {}
              },
              "@timestamp": "2023-09-14T07:56:37.000Z",
              "host": {
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T20:50:04.699887827Z",
                "kind": "event"
              },
              "message": "pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser"
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T20:50:04.699887827Z"
            }
          }
        },
        {
          "processor_type": "append",
          "status": "success",
          "if": {
            "condition": "ctx.host?.hostname != null && ctx.host?.hostname != ''",
            "result": true
          },
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "process": {
                "name": "sshd",
                "pid": 107572
              },
              "system": {
                "syslog": {}
              },
              "@timestamp": "2023-09-14T07:56:37.000Z",
              "related": {
                "hosts": [
                  "dinf-miro"
                ]
              },
              "host": {
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T20:50:04.699887827Z",
                "kind": "event"
              },
              "message": "pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2  user=someuser"
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T20:50:04.699887827Z"
            }
          }
        }
      ]
    }
  ]
}

I don't understand why the user=someuser hasn't been parsed. When using the module, that information goes into [related][user].

But I think I may have someting. Let me perform some more tests and I'll get back to you.

Thanks,
Yanick

Hi @stephenb,

Some messages above show

"message" => "Sep 12 11:31:28 dinf-miro sshd[13886]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=myuser",
       

Some show the message which are already half parsed

   "host" => {
        "name" => "syslogf.domain.com"
    },
              "message" => "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=toto",
                 "tags" => [

and then the raw

Sep 14 07:56:37 dinf-miro sshd[107572]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 

You're right, probably the filebeat is adding something when processing the rtaw log line. I should try to catch the data that filebeat is sending to the logstash server. Will look if I can add an "output" to a JSON file into my filebeat.yml .

Otherwise I will use tcpdump to capture it.

I'll get back to you shortly.

Yanick

I will need to check I do not see anything in the system-syslog pipeline that addresses that...

You can just go to Stack Management -> Ingest PIpelines and look at the pipeline

Something basic going on I will try you message through the whole processing and see what I see later.

So I just ran this file through the system syslog module directly from Filebeat to Elasticsearch
Just Filebeat (system module) -> Elasticsearch

Sep 14 07:56:37 dinf-miro sshd[107572]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.2 user=someuser
Sep 14 07:57:37 dinf-miro sshd[107572]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.3 user=albert
Sep 14 07:58:37 dinf-miro sshd[107572]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.4 user=fred
Sep 14 07:59:37 dinf-miro sshd[107572]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.5 user=extra

system.yml

  syslog:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths: [ "/Users/sbrown/workspace/sample-data/discuss/discuss-syslog.log" ]

And the resulting documents look like this I do not see the user parsed...

(not that is that hard to parse if needed)

{
  "_index": ".ds-filebeat-8.6.2-2023.09.14-000001",
  "_id": "pVOKlYoBcNlXK5e6MLwv",
  "_version": 1,
  "_score": 0,
  "_source": {
    "container": {
      "id": "discuss"
    },
    "agent": {
      "name": "hyperion",
      "id": "b6e1bdd5-6b0f-4883-9b30-a2a79c9c30c9",
      "ephemeral_id": "536915ef-db63-4e94-b60c-32c05ac1cef8",
      "type": "filebeat",
      "version": "8.6.2"
    },
    "process": {
      "name": "sshd",
      "pid": 107572
    },
    "log": {
      "file": {
        "path": "/Users/sbrown/workspace/sample-data/discuss/discuss-syslog.log"
      },
      "offset": 444
    },
    "fileset": {
      "name": "syslog"
    },
    "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.3.1.5 user=extra",
    "input": {
      "type": "log"
    },
    "@timestamp": "2023-09-14T07:59:37.000-07:00",
    "system": {
      "syslog": {}
    },
    "ecs": {
      "version": "1.12.0"
    },
    "related": {
      "hosts": [
        "dinf-miro"
      ]
    },
    "service": {
      "type": "system"
    },
    "host": {
      "hostname": "dinf-miro",
      "os": {
        "build": "22G91",
        "kernel": "22.6.0",
        "name": "macOS",
        "family": "darwin",
        "type": "macos",
        "version": "13.5.2",
        "platform": "darwin"
      },
      "ip": [
        "fe80::aede:48ff:fe00:1122",
        "192.168.86.90",
        "fe80::d0:5a35:3a1:bd01",
        "fd32:2d42:dd7:f422:1803:d903:ef7c:38b9",
        "fe80::c096:f8ff:fee1:ccb9",
        "fe80::c096:f8ff:fee1:ccb9",
        "fe80::5e4:da38:fb3c:efec",
        "fe80::e130:f2e7:342d:eff8",
        "fe80::ce81:b1c:bd2c:69e",
        "fe80::142c:e96:ccc2:a167",
        "192.168.2.107"
      ],
      "name": "hyperion",
      "id": "9E46F076-B7F1-53AA-921B-C2F983746B79",
      "mac": [
        "5C-52-30-9C-EF-E0",
        "7E-52-30-9C-EF-E0",
        "82-B2-58-49-30-00",
        "82-B2-58-49-30-01",
        "82-B2-58-49-30-04",
        "82-B2-58-49-30-05",
        "A0-CE-C8-51-95-38",
        "AC-DE-48-00-11-22",
        "C2-96-F8-E1-CC-B9"
      ],
      "architecture": "x86_64"
    },
    "event": {
      "ingested": "2023-09-14T21:12:56.110794259Z",
      "timezone": "-07:00",
      "kind": "event",
      "module": "system",
      "dataset": "system.syslog"
    }
  }
}

I tried with the auth module too and no parse user so it is unclear how you expected that to be parsed or what you did to parse it.

Is there other logic in your logstash...

There is something going on... it is unclear to me.

Hi again!

Here is the output to file from my filebeat. I don't know if it is transfering it in that format (JSON) to the logstash server though.

{
  "@timestamp": "2023-09-14T21:14:08.829Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "8.6.2",
    "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
    "raw_index": "filebeat-fsci-8.6.2-sys-linux"
  },
  "agent": {
    "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
    "name": "syslogf.sti.usherbrooke.ca",
    "type": "filebeat",
    "version": "8.6.2",
    "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce"
  },
  "ecs": {
    "version": "8.0.0"
  },
  "log": {
    "file": {
      "path": "/var/log/facultes/FSCI/fsci-secure.log"
    },
    "offset": 11246
  },
  "message": "Sep 14 17:14:05 dinf-miro sshd[120773]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
  "tags": [
    "audit",
    "syslog",
    "FSCI"
  ],
  "input": {
    "type": "filestream"
  },
  "host": {
    "name": "syslogf.sti.usherbrooke.ca"
  }
}

Then I tried to run this into the ingest simulator.

The input:

POST _ingest/pipeline/filebeat-8.6.2-system-syslog-pipeline/_simulate?verbose=true
{
  "docs": [
    {
      "_source": {
        "@timestamp": "2023-09-14T21:14:08.829Z",
        "@metadata": {
          "beat": "filebeat",
          "type": "_doc",
          "version": "8.6.2",
          "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
          "raw_index": "filebeat-fsci-8.6.2-sys-linux"
        },
        "agent": {
          "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
          "name": "syslogf.sti.usherbrooke.ca",
          "type": "filebeat",
          "version": "8.6.2",
          "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce"
        },
        "ecs": {
          "version": "8.0.0"
        },
        "log": {
          "file": {
            "path": "/var/log/facultes/FSCI/fsci-secure.log"
          },
          "offset": 11246
        },
        "message": "Sep 14 17:14:05 dinf-miro sshd[120773]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
        "tags": [
          "audit",
          "syslog",
          "FSCI"
        ],
        "input": {
          "type": "filestream"
        },
        "host": {
          "name": "syslogf.sti.usherbrooke.ca"
        }
      }
    }
  ]
}

The output:

{
  "docs": [
    {
      "processor_results": [
        {
          "processor_type": "set",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "input": {
                "type": "filestream"
              },
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce",
                "version": "8.6.2"
              },
              "@timestamp": "2023-09-14T21:14:08.829Z",
              "ecs": {
                "version": "8.0.0"
              },
              "log": {
                "offset": 11246,
                "file": {
                  "path": "/var/log/facultes/FSCI/fsci-secure.log"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "beat": "filebeat",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "type": "_doc",
                "version": "8.6.2"
              },
              "host": {
                "name": "syslogf.sti.usherbrooke.ca"
              },
              "message": "Sep 14 17:14:05 dinf-miro sshd[120773]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
              "event": {
                "ingested": "2023-09-14T21:19:14.220979552Z"
              },
              "tags": [
                "audit",
                "syslog",
                "FSCI"
              ]
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T21:19:14.220979552Z"
            }
          }
        },
        {
          "processor_type": "grok",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce",
                "version": "8.6.2"
              },
              "process": {
                "name": "sshd",
                "pid": 120773
              },
              "log": {
                "offset": 11246,
                "file": {
                  "path": "/var/log/facultes/FSCI/fsci-secure.log"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "beat": "filebeat",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "type": "_doc",
                "version": "8.6.2"
              },
              "message": "Sep 14 17:14:05 dinf-miro sshd[120773]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
              "tags": [
                "audit",
                "syslog",
                "FSCI"
              ],
              "input": {
                "type": "filestream"
              },
              "@timestamp": "2023-09-14T21:14:08.829Z",
              "system": {
                "syslog": {
                  "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
                  "timestamp": "Sep 14 17:14:05"
                }
              },
              "ecs": {
                "version": "8.0.0"
              },
              "host": {
                "name": "syslogf.sti.usherbrooke.ca",
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T21:19:14.220979552Z"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T21:19:14.220979552Z"
            }
          }
        },
        {
          "processor_type": "remove",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "input": {
                "type": "filestream"
              },
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce",
                "version": "8.6.2"
              },
              "process": {
                "name": "sshd",
                "pid": 120773
              },
              "@timestamp": "2023-09-14T21:14:08.829Z",
              "system": {
                "syslog": {
                  "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
                  "timestamp": "Sep 14 17:14:05"
                }
              },
              "ecs": {
                "version": "8.0.0"
              },
              "log": {
                "offset": 11246,
                "file": {
                  "path": "/var/log/facultes/FSCI/fsci-secure.log"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "beat": "filebeat",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "type": "_doc",
                "version": "8.6.2"
              },
              "host": {
                "name": "syslogf.sti.usherbrooke.ca",
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T21:19:14.220979552Z"
              },
              "tags": [
                "audit",
                "syslog",
                "FSCI"
              ]
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T21:19:14.220979552Z"
            }
          }
        },
        {
          "processor_type": "rename",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce",
                "version": "8.6.2"
              },
              "process": {
                "name": "sshd",
                "pid": 120773
              },
              "log": {
                "offset": 11246,
                "file": {
                  "path": "/var/log/facultes/FSCI/fsci-secure.log"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "beat": "filebeat",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "type": "_doc",
                "version": "8.6.2"
              },
              "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
              "tags": [
                "audit",
                "syslog",
                "FSCI"
              ],
              "input": {
                "type": "filestream"
              },
              "@timestamp": "2023-09-14T21:14:08.829Z",
              "system": {
                "syslog": {
                  "timestamp": "Sep 14 17:14:05"
                }
              },
              "ecs": {
                "version": "8.0.0"
              },
              "host": {
                "name": "syslogf.sti.usherbrooke.ca",
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T21:19:14.220979552Z"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T21:19:14.220979552Z"
            }
          }
        },
        {
          "processor_type": "date",
          "status": "success",
          "if": {
            "condition": "ctx.event.timezone == null",
            "result": true
          },
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce",
                "version": "8.6.2"
              },
              "process": {
                "name": "sshd",
                "pid": 120773
              },
              "log": {
                "offset": 11246,
                "file": {
                  "path": "/var/log/facultes/FSCI/fsci-secure.log"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "beat": "filebeat",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "type": "_doc",
                "version": "8.6.2"
              },
              "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
              "tags": [
                "audit",
                "syslog",
                "FSCI"
              ],
              "input": {
                "type": "filestream"
              },
              "@timestamp": "2023-09-14T17:14:05.000Z",
              "system": {
                "syslog": {
                  "timestamp": "Sep 14 17:14:05"
                }
              },
              "ecs": {
                "version": "8.0.0"
              },
              "host": {
                "name": "syslogf.sti.usherbrooke.ca",
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T21:19:14.220979552Z"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T21:19:14.220979552Z"
            }
          }
        },
        {
          "processor_type": "date",
          "status": "skipped",
          "if": {
            "condition": "ctx.event.timezone != null",
            "result": false
          }
        },
        {
          "processor_type": "remove",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce",
                "version": "8.6.2"
              },
              "process": {
                "name": "sshd",
                "pid": 120773
              },
              "log": {
                "offset": 11246,
                "file": {
                  "path": "/var/log/facultes/FSCI/fsci-secure.log"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "beat": "filebeat",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "type": "_doc",
                "version": "8.6.2"
              },
              "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
              "tags": [
                "audit",
                "syslog",
                "FSCI"
              ],
              "input": {
                "type": "filestream"
              },
              "@timestamp": "2023-09-14T17:14:05.000Z",
              "system": {
                "syslog": {}
              },
              "ecs": {
                "version": "8.0.0"
              },
              "host": {
                "name": "syslogf.sti.usherbrooke.ca",
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T21:19:14.220979552Z"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T21:19:14.220979552Z"
            }
          }
        },
        {
          "processor_type": "set",
          "status": "success",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce",
                "version": "8.6.2"
              },
              "process": {
                "name": "sshd",
                "pid": 120773
              },
              "log": {
                "offset": 11246,
                "file": {
                  "path": "/var/log/facultes/FSCI/fsci-secure.log"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "beat": "filebeat",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "type": "_doc",
                "version": "8.6.2"
              },
              "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
              "tags": [
                "audit",
                "syslog",
                "FSCI"
              ],
              "input": {
                "type": "filestream"
              },
              "@timestamp": "2023-09-14T17:14:05.000Z",
              "system": {
                "syslog": {}
              },
              "ecs": {
                "version": "8.0.0"
              },
              "host": {
                "name": "syslogf.sti.usherbrooke.ca",
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T21:19:14.220979552Z",
                "kind": "event"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T21:19:14.220979552Z"
            }
          }
        },
        {
          "processor_type": "append",
          "status": "success",
          "if": {
            "condition": "ctx.host?.hostname != null && ctx.host?.hostname != ''",
            "result": true
          },
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_version": "-3",
            "_source": {
              "agent": {
                "name": "syslogf.sti.usherbrooke.ca",
                "id": "7f080758-7e4f-4d73-9109-2a05b9e1a0b7",
                "type": "filebeat",
                "ephemeral_id": "e7efe019-569a-4ea6-8748-22545cb6a0ce",
                "version": "8.6.2"
              },
              "process": {
                "name": "sshd",
                "pid": 120773
              },
              "log": {
                "offset": 11246,
                "file": {
                  "path": "/var/log/facultes/FSCI/fsci-secure.log"
                }
              },
              "@metadata": {
                "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
                "beat": "filebeat",
                "raw_index": "filebeat-fsci-8.6.2-sys-linux",
                "type": "_doc",
                "version": "8.6.2"
              },
              "message": "pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.32.104.212 user=quiy2001",
              "tags": [
                "audit",
                "syslog",
                "FSCI"
              ],
              "input": {
                "type": "filestream"
              },
              "@timestamp": "2023-09-14T17:14:05.000Z",
              "system": {
                "syslog": {}
              },
              "ecs": {
                "version": "8.0.0"
              },
              "related": {
                "hosts": [
                  "dinf-miro"
                ]
              },
              "host": {
                "name": "syslogf.sti.usherbrooke.ca",
                "hostname": "dinf-miro"
              },
              "event": {
                "ingested": "2023-09-14T21:19:14.220979552Z",
                "kind": "event"
              }
            },
            "_ingest": {
              "pipeline": "filebeat-8.6.2-system-syslog-pipeline",
              "timestamp": "2023-09-14T21:19:14.220979552Z"
            }
          }
        }
      ]
    }
  ]
}

Does this input make momre sense to you? But in the output, there are still some missing fields. I'm totally confused.

Yanick

There is something going on... it is unclear to me.

Well I'm totally confused too. I just see the final result and when using the module it's being parsed. Wondering if there mauy have other ingest pipeline involved for the filebeat.

I'm totally confused too.

Let me make some more tests and I'll get back to you!

Thanks again for yur help, that is very appreciated!

Yanick

Hi @yquirion Lets start over

Apologies but All the pieces and inconsistent message etc... I have lost context

Show me how it all works and parses the user etc... then perhaps I can help....

Please show me everything exactly how you configured for when you say it is working

I do not know what is working

Is it
a) Source Log File -> Filebeat (system model) -> Elasticsearch
b) Source Log File -> Filebeat (system model) -> Logstash -> Elasticsearch

I am not sure if "working" is a) or b)

Simply show me the following for when it is "Working"

A source syslog log file (5 lines)

Your entire filebeat.yml

Your entire system.yml

your entire logstash.conf, not pieces

and the resulting JSON in Elasticsearch (not from the logstash output) of the proper / working results

Show me what works in detail and perhaps I can make the other work...

Hi @stephenb,

No problem give me some minutes to reconfigure it using the system module and disable the filestream.

That being said, I may have a clue why the parsing isn't good using the filestream input; there was this into my filebeat.yml:

  parsers:
    - syslog:

I did make a test before playng with pipeline and I forgot to remove it. After I removed it, the parsing seems a bit better, but the timezone if 4 hours behind (not a big deal for now).

That being saidf, to avoid any confusion, I will send you the following into another reply (not ready yet):

A source syslog log file (5 lines)

Your entire filebeat.yml

Your entire system.yml

your entire logstash.conf, not pieces

and the resulting JSON in Elasticsearch (not from the logstash output) of the proper / working results

Please keep in mind that the pipeline.conf I will send you is also used at other purpose. You will need only to look at the if that match the input.

I prepare everything and I'll send it to you (will upload the files instead of pasting them here).

Thanks!
Yanick