Logstash workflow design - pipelines and configuration files

Hi all,

New to Elastic and Logstash, however my scenario is large IT environment and I would like to collect logs from various sources. First I would like to design most basic ones such as filebeat collecting syslogs, some other common modules for apache, IIS and etc. Will be great to keep all data relevent and flexible to filtering and enriching it so I'm thinking for scenario filebeat module [syslog,apache] -> logstash main pipeline -> one multi.conf file with input/filter/output logic for all modules separate by IF conditions (can't make it working):

input {
  beats {
    port => 5044

  }
}

filter {
  if [fileset][name] == "syslog" {
    if [message] =~ /last message repeated [0-9]+ times/ {
      drop { }
    }
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    syslog_pri { }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
    mutate {
      add_tag => ["syslog"]
    }
  }
}

filter {
  if [fileset][name] == "apache" {
      grok {
        match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
        overwrite => [ "message" ]
      }
      date {
        match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      }
      mutate {
      add_tag => ["apache"]
    }
  }
}

filter {
  if [fileset][name] == "nginx" {
      grok {
        match => { "message" => "%{COMBINEDNGINXOG}"}
      }
      date {
        match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      }
      mutate {
      add_tag => ["nginx"]
    }
  }
}

output {
  if "syslog" in [tags] {
    elasticsearch {
      hosts => "https://XXXXXXXXXXXXX"
      index => "syslog-%{+YYYY.MM.dd}-%{[@metadata][version]}"

    }
  }

 if "apache" in [tags] {
    elasticsearch {
      hosts => "https://XXXXXXXXXXXXX"
      index => "apache-%{+YYYY.MM.dd}-%{[@metadata][version]}"

    }
  }

   if "nginx" in [tags] {
    elasticsearch {
      hosts => "https://XXXXXXXXXXXXX"
      index => "nginx-%{+YYYY.MM.dd}-%{[@metadata][version]}"

    }
  }
}

OR separate conf files on Logstash for input (beats, several conf files for filters (10-syslog-filter.conf, 11-apache-filter.conf) and output file with several IF statements for separating the workflow based on tags (syslog, apache) and create separate indexes (syslog-{date}, apache-{date} and etc.).

So my first question is which is best logic for separating different filters and logic with one big multi.conf file or separate files. Also should I use multiple pipelines for this scenario or one single pipeline with configuration file(s) is sufficient in Beats input scenarion for begining. I've red most of the documentation but could't find and example and "best practice" for similar scenario.

Thanks in advance,
Hristo.

You can do this multiple ways I would define a single Filebeat input pipeline that then outputs (based on the tag in the document) to another logstash pipeline that is waiting for a pipeline input.

This way you can keep the configurations and individual pipelines separate and not worry about a configuration mistake stopping your entire pipeline that has everything.

This way you also keep the distribution logic at the top level and then don't have to worry about too many if statements in the corresponding config files.

You can also define the index name as a field and then use a single output that just replaces that field in the index parameter.

I recommend you read this:

00_filebeat_input >>> 0-syslog-filter.conf >>> zzz_output.conf
                  >>> 11-apache-filter.conf ^^^

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.