Logstash-input-journald utf8 issues

Hi there!

The logstash-input-journald plugin does not seem to support "special" characters (like 'ö', 'ő', 'ű' etc.). If I log a string containing such characters, they will end up messed up in the output (ES or file -- doesn't matter). I'm fairly certain it all boils down to this line (or these two lines actually):

Forcing encoding to iso-8859-1 breaks all non-latin characters, no surprise there. Now I see why this is done: journal fields may contain binary data, but I'm not even sure if this workaround is safe all the time. Forcing the encoding to utf8 "fixes" the issue, but it's the same ugly hack. I don't think that any arbitrary byte sequence "makes sense" in utf8, or iso-8859-1 for that matter. I tried to write a method that base64 encodes the field value if its encoding is ASCII-8BIT (ie. it's binary) , and leaves it untouched otherwise. This should work according to the journal documentation (https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html): "Primarily, fields are formatted UTF-8 text strings, and binary formatting is used only where formatting as UTF-8 text strings makes little sense." But it's not working, all fields are ASCII-8BIT encoded. So it might be a bug in the underlying ruby "systemd-journal" library, or even the C API. (It has other issues too, so I wouldn't be surprised.)

Soo... what I do? :slight_smile:

Thanks!