Logstash not handling Byte Order Mark (BOM) correctly

See reply for update.

I have an rsyslog config that looks like this:

$ModLoad imtcp
$ModLoad imudp

$template myFormat,"<%pri%> %timestamp% <%syslogfacility%.%syslogpriority%> %hostname% %syslogtag%: %msg%\n"
$ActionFileDefaultTemplate myFormat

$template RemoteHost,"/var/log/syslog.log"

$RuleSet remote
*.* ?RemoteHost
*.* @@127.0.0.1:8417

$InputTCPServerBindRuleset remote
$InputUDPServerBindRuleset remote
$TCPServerAddress X.X.X.X
$UDPServerAddress X.X.X.X
$InputTCPServerRun 514
$UDPServerRun 514

And a Logstash config that looks like this:

input {
    tcp {
        host => '127.0.0.1'
        port => 8417
        type => syslog
    }

    udp {
        host => '127.0.0.1'
        port => 8417
        type => syslog
    }
}

filter {
}

output {
      stdout {
        codec => rubydebug
      }
}

Should be pretty simple but it's really not interpreting my input correct.

Here's an example input:

<190> Jul 26 10:52:02 <23.6> HOST-FOO (FPC: Slot 2, PIC Slot 2) ms22 mspmand[188]: msvcs_create_child_session: child session already exists

And this is what Logstash does with it:

"message" => "<190>Jul 26 10:52:02 HOST-FOO (FPC \xEF\xBB\xBFSlot 2, PIC Slot 2) ms22 mspmand[188]: msvcs_create_child_session: child session already exists",

This seems to be consistent across all messages. It looks like it does the following:

  • Removes the space between the syslog PRI and the timestamp
  • Removes the <syslogfacility.syslogpriority> entirely
  • Replaces seemingly random characters with hex representation

There are a few more weird mutations I've seen but this is just one example. I'm running Logstash 6.2.3. Does anyone know what could cause this? Is there some kind of encoding I need to specify?

1 Like

Seems like my formatting issue was in my rsyslog config. I made the following change to the RuleSet:

*.* @@127.0.0.1:8417;myFormat

I still get the hex in the message though. I did some more research and it looks like a Byte Order Mark (BOM) that gets added in the beginning of the message. I did a packet capture and it's there when I receive syslog to my machine. I guess the file writer just handles it more intelligently than Logstash. I added this line to my inputs to see if it would help:

codec => line { charset => "UTF-8" }

Still doesn't work. I'm not sure if the best solution is to make rsyslog or Logstash remove it. Right now I can't find a solution for either though. Any advice/insight would be appreciated.

It doesn't look like there is any way to get rid of this on rsyslog's side. I tried using the mutate filter's gsub but that doesn't seem to work because the "\xEF\xBB\xBF" isn't actually ASCII. The best I can do right now is use the i18n filter's transliterate but that doesn't help much because it just inserts a "?" instead. I'm thinking the only thing I can do is fork i18n and instead of leaving a "?' use an empty string as the replacement.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.