Syslog message is incomplete

Hey community,
I have to setup a ELK Stack and everything runs perfectly until I had to implement syslogs. My problem here is that the syslog I receive in Logstash differs from the syslog that I receive with rsyslog (located in /var/log/syslog).

Here is what I found in /var/log/syslog:

Jul 26 09:06:39 hostname.local openvpn[46263]: write UDPv4: No buffer space available (code=55)

And here is the message in Logstash:

<27>Jul 26 09:06:39 openvpn[46263]: write UDPv4: No buffer space available (code=55)

No big difference, right? Well, the hostname is missing and I have that weird number at the beginning (tell me If you know what it is). So the grok pattern (see below) is not 100% working.

My filter:

input {
  tcp {
    port => 5000
    type => syslog
  }
  udp {
    port => 5000
    type => syslog
    codec => plain
  }
#  stdin { type => syslog }
}

filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_tag => [ "syslog" ]
    }
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
      add_tag => [ "apache" ]
    }
    date {
      match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
      target => "syslog_timestamp"
    }
    date {
      match => [ "apache_timestamp", "dd/MMM/YYYY:HH:mm:ss Z","dd/MMM/YYYY:HH:mm:ss Z" ]
      target => "apache_timestamp"
    }
    if "beats_input_codec_plain_applied" in [tags] {
      mutate {
        remove_tag => ["beats_input_codec_plain_applied"]
      }
    }
    if [host] == "logserver" {
      mutate {
        replace => [ "host", "%{syslog_hostname}" ]
        remove_field => [ "syslog_hostname" ]
      }
    }
  }
}

output {
  elasticsearch {
    hosts => localhost
    user => user
    password => super_secret-CENSORED-pw
    ssl => false
    manage_template => false
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
  stdout { codec => rubydebug }
}

And here is what I get in return:

{
          "tags" => [
        [0] "_grokparsefailure"
    ],
       "message" => "<27>Jul 26 09:06:39 openvpn[46263]: write UDPv4: No buffer space available (code=55)",
          "host" => "xx.xxx.x.xx", # which is not the hostname but a ip address that never changes
          "type" => "syslog",
      "@version" => "1",
    "@timestamp" => 2019-07-26T09:06:39.496Z
}

Why is there a difference, why isn't it the default syntax of a normal syslog.

I have that weird number at the beginning (tell me If you know what it is).

It is syslog priority, see: https://tools.ietf.org/html/rfc5424#section-6.2.1

I don't fully understand your setup, would you elaborate it a bit? What is sending the syslog, rsyslog? Logstash is always receiving, not ever sending, right? At least I don't see syslog output.

    if "beats_input_codec_plain_applied" in [tags] {
      mutate {
        remove_tag => ["beats_input_codec_plain_applied"]
      }
    }

BTW, you can replace this by configuring beats input not to add the tag: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html#plugins-inputs-beats-include_codec_tag

Thanks, good to know.

Right. I have a different servers where different services are running on and server are sending logs to this one logserver with syslog (over udp).

I have two filters, one for generic syslogs and the other one for apache syslogs. And I noticed that If I paste a log in where the hostname is given, the field host is the hostname of the logserver, I don't want that, so I replace the value of host and delete the syslog_hostname.

Thanks for the fast awnser.

Sorry, I meant that I am not sure about your environment configuration, I understood the Logstash config :slight_smile:
But now I also understand your environment. So about tackling the hostname issue, the sending client is not sending the hostname as part of the syslog message, right? Then it is a config issue with the client, that's why I was asking about rsyslog.
Or does Logstash mystically remove that part?

I try now to use Filebeats which reads out the content of /var/log/syslog.

Well, I ran tcpdump and I see here that syslog don't even sends the hostname. Syslog adds the hostname afterwards.

Please be more clear. Syslog is a standard, not a product. Rsyslog and syslog-ng are the most common syslog servers. /var/log/syslog is just a file where one of those servers writes.

So you have /var/log/syslog file, which includes the hostname in the syslog message. But when you read it with Filebeat, which sends it to Logstash, the hostname is missing, correct?

And you checked that Filebeat strips the hostname from message with tcpdump?

Sorry If I didn't was clear.

Corrently I read the UDP stream from syslog directly with Logstash, process it and sends it to Elasticsearch. But then I noticed that the hostname is missing. Some minutes ago I ran tcpdump to inspect the raw packets from syslog. There I noticed that not Logstash removed the hostname but syslog just didn't sent the hostname. Logstash tooks the raw package by UDP.

The logs written in /var/log/syslog are modified afterwards where the hostname is written in.

I am now trying to read the syslog with Filebeat where the hostname should be included in Logstash.

Maybe I am starting to grasp what is going on here. So you have a syslog server, which both writes to the /var/log/syslog and then you have configured it to send the events also to Logstash over UDP?
If this is the case, your syslog server is using different template for writing to the log and sending the event over the network. You have to check the manual of the server.

If you want to just read the /var/log/syslog file with Filebeat, there shouldn't be any issue and you should be able to parse it with Logstash properly, if you just add the syslog priority in your grok pattern (or add a new pattern containing it if you are also receiving syslog messages which doesn't include the priority).

Well I managed it to get the messages with Filebeat. Previously I used Logstash only to receive the syslog messages.

I can't run the syslog service and Logstash concurrently because of the port. Now, I don't listen to syslog with Logstash anymore, instead I read the syslog file with Filebeat which gives me the hostname. Filebeat sends it now to Logstash where I can filter it.

So I think the issue is resolved now.

I still don't understand the second paragraph at all (where are all those components and services running on), but if you are happy with it, then great :slight_smile:

Note that logstash has a syslog_pri filter that will parse that number into its component fields.