Hey community,
I have to setup a ELK Stack and everything runs perfectly until I had to implement syslogs. My problem here is that the syslog I receive in Logstash differs from the syslog that I receive with rsyslog (located in /var/log/syslog).
Here is what I found in /var/log/syslog:
Jul 26 09:06:39 hostname.local openvpn[46263]: write UDPv4: No buffer space available (code=55)
And here is the message in Logstash:
<27>Jul 26 09:06:39 openvpn[46263]: write UDPv4: No buffer space available (code=55)
No big difference, right? Well, the hostname is missing and I have that weird number at the beginning (tell me If you know what it is). So the grok pattern (see below) is not 100% working.
My filter:
input {
tcp {
port => 5000
type => syslog
}
udp {
port => 5000
type => syslog
codec => plain
}
# stdin { type => syslog }
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_tag => [ "syslog" ]
}
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
add_tag => [ "apache" ]
}
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
target => "syslog_timestamp"
}
date {
match => [ "apache_timestamp", "dd/MMM/YYYY:HH:mm:ss Z","dd/MMM/YYYY:HH:mm:ss Z" ]
target => "apache_timestamp"
}
if "beats_input_codec_plain_applied" in [tags] {
mutate {
remove_tag => ["beats_input_codec_plain_applied"]
}
}
if [host] == "logserver" {
mutate {
replace => [ "host", "%{syslog_hostname}" ]
remove_field => [ "syslog_hostname" ]
}
}
}
}
output {
elasticsearch {
hosts => localhost
user => user
password => super_secret-CENSORED-pw
ssl => false
manage_template => false
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
And here is what I get in return:
{
"tags" => [
[0] "_grokparsefailure"
],
"message" => "<27>Jul 26 09:06:39 openvpn[46263]: write UDPv4: No buffer space available (code=55)",
"host" => "xx.xxx.x.xx", # which is not the hostname but a ip address that never changes
"type" => "syslog",
"@version" => "1",
"@timestamp" => 2019-07-26T09:06:39.496Z
}
Why is there a difference, why isn't it the default syntax of a normal syslog.
I don't fully understand your setup, would you elaborate it a bit? What is sending the syslog, rsyslog? Logstash is always receiving, not ever sending, right? At least I don't see syslog output.
if "beats_input_codec_plain_applied" in [tags] {
mutate {
remove_tag => ["beats_input_codec_plain_applied"]
}
}
Right. I have a different servers where different services are running on and server are sending logs to this one logserver with syslog (over udp).
I have two filters, one for generic syslogs and the other one for apache syslogs. And I noticed that If I paste a log in where the hostname is given, the field host is the hostname of the logserver, I don't want that, so I replace the value of host and delete the syslog_hostname.
Sorry, I meant that I am not sure about your environment configuration, I understood the Logstash config
But now I also understand your environment. So about tackling the hostname issue, the sending client is not sending the hostname as part of the syslog message, right? Then it is a config issue with the client, that's why I was asking about rsyslog.
Or does Logstash mystically remove that part?
Please be more clear. Syslog is a standard, not a product. Rsyslog and syslog-ng are the most common syslog servers. /var/log/syslog is just a file where one of those servers writes.
So you have /var/log/syslog file, which includes the hostname in the syslog message. But when you read it with Filebeat, which sends it to Logstash, the hostname is missing, correct?
And you checked that Filebeat strips the hostname from message with tcpdump?
Corrently I read the UDP stream from syslog directly with Logstash, process it and sends it to Elasticsearch. But then I noticed that the hostname is missing. Some minutes ago I ran tcpdump to inspect the raw packets from syslog. There I noticed that not Logstash removed the hostname but syslog just didn't sent the hostname. Logstash tooks the raw package by UDP.
The logs written in /var/log/syslog are modified afterwards where the hostname is written in.
I am now trying to read the syslog with Filebeat where the hostname should be included in Logstash.
Maybe I am starting to grasp what is going on here. So you have a syslog server, which both writes to the /var/log/syslog and then you have configured it to send the events also to Logstash over UDP?
If this is the case, your syslog server is using different template for writing to the log and sending the event over the network. You have to check the manual of the server.
If you want to just read the /var/log/syslog file with Filebeat, there shouldn't be any issue and you should be able to parse it with Logstash properly, if you just add the syslog priority in your grok pattern (or add a new pattern containing it if you are also receiving syslog messages which doesn't include the priority).
Well I managed it to get the messages with Filebeat. Previously I used Logstash only to receive the syslog messages.
I can't run the syslog service and Logstash concurrently because of the port. Now, I don't listen to syslog with Logstash anymore, instead I read the syslog file with Filebeat which gives me the hostname. Filebeat sends it now to Logstash where I can filter it.
I still don't understand the second paragraph at all (where are all those components and services running on), but if you are happy with it, then great
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.