How to handle german Umlauts or anything thats not UTF-8


(Rolf Kimmelmann) #1

Hallo,

i had a little Problem with processing Logmessages that contain some german umlauts.
But i found a solution how to handle umlauts (or anything which is not UTF-8 encoded)

When you check the logstash.log you might find something like this:

:message=>"Received an event that has a different character encoding than you configured.", :text=>"here is your message", :expected_charset=>"UTF-8",

Here is my solution:


input{
    
    lumberjack{
        port => 5043
         ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
         ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
         codec => plain {charset => "CP1252" } # you can use another but this charset works perfectly for german
    }
}
filter{
    mutate{
        gsub => [
            # replace all german umlauts
            # this is optional, logstash will convert the umlauts to the correct unicode codepoint
            "message", "ä", "ae",
            "message", "ö", "oe",
            "message", "ü", "ue"
            
        ]
    }
...
}

Logstash converts the Umlauts to the the correct unicode codepoints. You don't have to replace them, if you don't want to.

If you copy this in your logstash.conf via Windows (e.g. via Notepad++ and and winSCP) and you try to run it on a machine using a unix based OS, logstash might not start and write this message:

:message=>"Error: The following config files contains non-ascii characters but are not UTF-8 encoded ["/etc/logstash/central.conf"]"

In this case, use vim or nano directly on your unix-machine to insert this.

Kind regards

Rolf


(Mark Walkom) #2

Nice solution :slight_smile:


Kibana doesn't show german Umlauts correctly
(Magnus Bäck) #3

I don't understand. If you declare a character set for your input plugin that matches the actual input text, won't Logstash do the right thing by reading the characters correctly and converting them to the correct Unicode codepoints?


(Rolf Kimmelmann) #4

You're right, logstash converts them directly to the unicode codepoints. It isn't necessary to replace the umlauts, if you want to keep them.

I will add this. thanks for your hint


(system) #5