How to tag log files in filebeat for logstash ingestion?

If I have several different log files in a directory, and I'm wanting to forward them to logstash for grok'ing and buffering, and then to downstream Elasticsearch.

As the files are coming out of Filebeat, how do I tag them with something so that logstash knows which filter to apply?

So if I have:

paths:
       - /path/to/logs/access.log
       - /path/to/other/logs/errors.log

What do I do with the beats input within logstash that allows me to send the right logs to the right grok filter?

3 Likes

You can add custom fields to the events that you can then use to conditional filtering in Logstash.

You can define multiple prospectors in the Filebeat configuration. So group the files that need the same processing under the same prospector so that the same custom fields are added.

1 Like

So to do this, I would need to define multiple prospectors?

I was really hoping that I could define the paths, and then be able to use some sort of macro/variable within the fields variable. So something like:

paths:

  • /path/to/logs/access.log
  • /path/to/other/logs/errors.log
    fields: %{filename}

Where the field coming across the wire for each beat would be 'access.log' and 'errors.log'.

If you want to apply conditional filtering based on the filename then use the source field.

Would the alternative (the custom fields method you suggested, using multiple prospectors) look something like this?

filebeat:
    prospectors:
      -
         paths:
             - /path/to/logs/access.log
         fields:  {log_type: access}
      - 
         paths:
             - /path/to/other/logs/errors.log
         fields: {log_type: errors}

If so, what are the options on the logstash side for setting the field type in the 'input' for beats, so that I can tag appropriately, so that my filter easily directs each message to the appropriate grok (I'm still learning about the macros available)?

I could see wanting do say something like:

input {
    beats {
        port => 5044
        type => %{log_type}
    }
}

Thanks for your help, by the way. I haven't been able to find any explicit examples on how to handle sending messages from beats to logstash, where logstash will handle the grok and parsing into JSON for downstream Elastic.

Digging deeper, it seems like I should be able to say:

type => "%{[@metadata][log_type]}"

or would it be

type =>"%{[log_type]}"

Or if I were trying to use the source field, using:

type => "%{[@metadata][source]}"

Does that look right? Is that how I would access a field within logstash, so that I can appropriately type the data within the input?

The first part is correct on its own, but you may want to change it based on how you intend to write your Logstash config.

The second part won't work. You cannot override the type field once it's set. This is documented here. If you want to change the type then set document_type: access in your Filebeat configuration.

Here's an example Logstash config based on the Filebeat config you gave:

Filebeat:

filebeat:
  prospectors:
    - paths:
        - /path/to/logs/access.log
      fields:  {log_type: access}
    -   
      paths:
        - /path/to/other/logs/errors.log
      fields: {log_type: errors}

Logstash:

input {
  beats {
    port => 5044
  }
}

filter {
  if [fields][log_type] == "access" {
    mutate {
      add_field => { "foo" => "var" }
    }
  }
}

output {
  stdout { codec => rubydebug{} }
}
4 Likes

Thank you so much. That will work!

I'm guessing this is related to the {dynamic_type} in the template file? Is there no way to dynamically set that type based upon the file input? Would I need to create a separate mapping for each of the different log types, identify those mappings in the yml file, and then do it that way?

In the end, with what you've shown me, I don't think it matters much. As long as I can quickly filter and grok based upon the incoming message type, then I'm fine. I'm just trying to learn a little more about how filebeat works at this point.

No, it is because the logstash-beats-input doesn't allow it to be overridden if it's already set.

In Filebeat you can use document_type: mytype on a per prospector basis. And in Logstash there are many ways to do this. My personal preference would be set type at the source using document_type. But here's an arbitrary Logstash example.

filter {
  if ([source] =~ /access.log$/) {
    mutate {
      replace => {
        "[@metadata][type]" => "access"
        "[type]" => "access"
      }
    }
  }

The provided index template will handle this assuming you are writing all these types to a filebeat-* index.

hi , i was new to elk was trying to figure out , how can i differentiate multiple files
/var/log/kakfa/*.log
these are the files i have in that path
controller.log kafka-authorizer.log kafkaServer-gc.log log-cleaner.log state-change.log controller.log controller.log kafka-request.log server.log

filebeat::
filebeat.prospectors:

  • input_type: log
    • /var/log/kafka/*.log

Logstash::

input {
beats {
port => 5044
tags => [ "kafka" ]
}

output {

if "kafka" in [tags] {
    elasticsearch {
        action => "index"
        hosts => "elasticsearch:port"
        index    => "kafka"

    }
}

bob-bza,

your config worked for me but what if you have multiple paths configured in filebeats and you want to have a separate index for each path?

For example what if I have this in my filebeat.yml:

filebeat::
filebeat.prospectors:
input_type: log
/var/log/kafka/.log
/var/log/message

Then how do I configure my logstash config file so that the kafka logs will be associated to the "kafka" index and the message logs will be associated to "message" index?

For your case you can use multiple prospectors and define fields / tags for each prospector.