I am trying to parse different logs with logstash . Right now i have only one log file coming from multiple servers at logstash . Parsing working OK . But i am planning to add multiple logs ( syslogs , apache , windows , nginx etc ) and send it to different indexes on elasticsearch .
Can someone suggest .
how can i add different inputs and filter and send to different indexes
do i need to have different configs , how do i do that
how about the load on logstash cluster , what is recommended
Below is my logstash config
input {
beats {
client_inactivity_timeout => 86400
port => 5044
}
}
filter {
mutate {
gsub => [
"message", "\t", " ",
"message", "\n", " "
]
}
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp_match}\]%{SPACE}\:\|\:%{SPACE}%{WORD:level}%{SPACE}\:\|\:%{SPA
CE}%{USERNAME:host_name}%{SPACE}\:\|\:%{SPACE}%{GREEDYDATA:coidkey}%{SPACE}\:\|\:%{SPACE}%{GREEDYDATA:clientinfo}%{SPACE}\:
\|\:%{SPACE}(%{IP:clientip})?%{SPACE}\:\|\:%{SPACE}%{GREEDYDATA:Url}%{SPACE}\:\|\:%{SPACE}%{JAVACLASS:class}%{SPACE}\:\|\:%
{SPACE}%{USER:ident}%{SPACE}%{GREEDYDATA:msg}"}
remove_field => [ "ident","offset","name","version","host" ]
}
}
output {
stdout { codec => rubydebug }
if "_grokparsefailure" in [tags] {
# write events that didn't match to a file
file { "path" => "/tmp/grok_failures.txt" }
} else{
elasticsearch {
hosts => "dfsyselastic.df.jabodo.com:9200"
user => "UN"
password => "PW"
index => "vicinio-%{+YYYY.MM.dd}"
document_type => "log"
}
}
}
If you have a way to tell which log message you are receiving then you will be able to specify what the "type" is. Then in the filter section you'd be able to write something along the lines of:
filter{
if [type] == "syslogs"{
grok{}
}
if [type] == "apache"{
grok{}
}
.....etc.....
}
Then for storing them to different indices
output {
stdout { codec => rubydebug }
if "_grokparsefailure" in [tags] {
# write events that didn't match to a file
file { "path" => "/tmp/grok_failures.txt" }
}
if [type]== "syslogs"{
elasticsearch {
hosts => "dfsyselastic.df.jabodo.com:9200"
user => "UN"
password => "PW"
index => "syslogs-vicinio-%{+YYYY.MM.dd}"
document_type => "syslog"
}
}
if [type]== "apache"{
elasticsearch {
hosts => "dfsyselastic.df.jabodo.com:9200"
user => "UN"
password => "PW"
index => "apache-vicinio-%{+YYYY.MM.dd}"
document_type => "apache"
}
}
}
As for the load on logstash cluster how many logs are you expecting and in what amount of time? (i.e. 100 logs per minute, hour, second, millisecond?)
I do not think it will be able to identify it at the same port unless there is some way of determining where it was sent from. I have not used the beats plugin before. But I do have experience working with tcp, udp and file input plugins. For the tcp and udp there is a field called host which states the IP address of the server that sent the message. Since each server has unique, static IP address I will be able to determine where/which type it is. I am unsure if Beats has this field. If it does then you do not need to have the
if [type] == "" .....
and you can say:
if "host" == "some indicator" {}
If there is no way to uniquely identify it using one port. Then you can have each server send their logs to a different port and you would be able to uniquely identify them based off of what port they came from.
Thanks again . I will try few things as per your advise.
Regarding request ...the log coming to logstash cluster is gonna be approximately 90-100K per minute or may be more . I have 4 logstash servers to ditribute the load . Here is filebeat config
#=========================== Filebeat prospectors =============================
filebeat.prospectors:
- input_type: log
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /archives/logs/tomcat7-8080/download.log
- /archives/logs/tomcat7-8090/download.log
tail_files: true
multiline.pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
#================================ Outputs =====================================
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
# hosts: ["lvsyslogstash1.lv.jabodo.com:5044"]
hosts: ["lvsyslogstash1.lv.jabodo.com:5044","lvsyslogstash2.lv.jabodo.com:5044","lvsyslogstash3.lv.jabodo.com:5044","lvsy
slogstash4.lv.jabodo.com:5044"]
loadbalance: true
worker: 2
# filebeat.publish_async: true
I do not think that there will be a problem with this load. Especially if it is distributed evenly across 4 logstash servers. I am not 100% certain though.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.