What is the purpose of type field in Input section

Hello All,

Whats the use of "type => " in Input section of Logstash ? Anyways I will be using grok filter in filter section to filter incoming messages ?

Thanks,
gaurav

The type option sets the value of the field with the same name. If you only ingest a single kind of log (and never will do anything else) you don't have to worry about it, but in all likelihood you'll eventually want to process different kinds of logs and then the type field will be a good way of distinguishing them.

Thanks.

Problem is :
I have data center where i have let us for e.g 3 networking providers.
A B C. All of them has slightly similar syslog format. (some of them has extra spaces some of them has not, some of them has program name and severity in that some of them are not)

How can I set type filed in Input section prior seeing their actual syslog message ?

Ideally it should be like :
input {
type=> A

type=> B

type=> C
....
}

Thats why I dont understand the purpose of type field in input section.

How are you sending the logs to LS - TCP, file, syslog?

I doing POC here. So I ask logstash to do this :

input {
tcp {
port => 5000
}
udp {
port => 5000
}
}

Then I do telnet localhost 5000 from same host machine and then insert manually syslog message which I have.

In production, I will make sure all network devices to publish syslog messages on some port and then logstash will listen to that port so that It will have continuous stream of syslog messages.

I don't see why you'd have to use different types for different kinds of syslog messages. When you're searching for messages in Kibana, why should your query be affected by the vendor of the device producing the events you're interested in?

Thanks. My requirement is :

Say, I have 3 Networking routers- A B C which are producing syslog messages. Their syslog messages come to logstash instance at any time and at any order. How can I search in Kibana , if I want to know how many messages from Vendor A I have got during certain time frame.

You could indeed use different types or this, I just suspect there are better criteria. Wouldn't e.g. the hostname be a more interesting condition?

Anyway, to set different types depending on what the message looks like, something like this would work:

filter {
  if [message] =~ /regexp that matches vendor A/ {
    mutate {
      replace => { "type" => "syslog_A" }
    }
  }
}

Another way which I'd probably use is having multiple grok filters that match each vendor's messages—I'm just assuming they have different formats—to avoid regexp duplication and add a vendor-specific tag that you later on can translate into a change of the type field.

filter {
  grok {
    match => ...
    add_tag => ["syslog_A"]
  }
  if "_grokparsefailure" not in [tags] {
    grok {
      match => ...
      add_tag => ["syslog_B"]
    }
  }
  ...
  if "syslog_A" in [tags] {
    mutate {
      replace => { "type" => "syslog_A" }
      remove_tag => ["syslog_A"]
    }
  }
}

Excellent. Thanks a lot for valuable suggestion. Appreciate your help.

How Can I multiple match => statements within grok filter ?
e.g:
First line :
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp}\s+%{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" }

One second line I have to reuse what I parsed in first line.
I want to do again like this :
match => {"syslog_hostname"=> some REGEX}

How can I do that in Logstash config.

If you want to try multiple grok expressions on a field and break on the first match:

grok {
  match => {
    "fieldname" => ["expression1", "expression2"]
  }
}

If you want to match against different fields it might work to have multiple match options in the same filter, but I'd use two consecutive filters.

Thanks for reply.
Problem is simple.
1: I need to parse hostname from syslog string.
I am able to do that with match expression.
2: With parsed hostname, I need to apply regex to find out which vendor it is. Assume that we name sysloghost name in certain way so that it contains vendor name.
for eg. hostname = region.city.vendor_name.company.com

How can I parse and get vendor_name in this case ?

Just use a separate grok filter for that, somewhere after the current filter that extracts the hostname field.

I did this and I am NOT able to see MONTH, DAY, and TIME in my output of logstash.

filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp}\s+%{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" }

 match => { "syslog_timestamp" => "%{MONTH:month} +%{MONTHDAY:day} %{TIME:time}" }      

  remove_field => ["@version", "host", "message", "port"]
}

date {
  target => "@timestamp"
  match => [ "syslog_timestamp",
             "MMM  d HH:mm:ss",
             "MMM dd HH:mm:ss" ]
  timezone => "UTC"
}

}

Did you try my suggestion of having two consecutive filters instead of two match options in the same filter?

Thanks It worked.
Could you please take your valuable time and try to reply to my question here ?

I appreciate your help.

Thanks. I tried this but I think there is problem here.
Let us say,
If I pass syslog_B type log first,

Then it goes to first grok block and then as It doe not match , It puts _grokparsefailure into tags. and then it goes to following if block and then I get match. So for syslog_B type message, I always get two tags one is failure one and one is syslog_B one.

How do I avoid this ? I can do this by following your first option but , second one more elegant and right :slight_smile:

Thanks,
Gaurav

If you don't want the _grokparsefailure tag you can remove it in each grok filter with remove_tag. That option is only triggered when the grok is successful, just like add_tag. Then you won't get a _grokparsefailure tag in the end if one of the grok expressions matched, but if none of them matched the tag will be there.

1 Like