UTF-8 in Windows


(Harvii Dent) #1

Hello,

There seems to be a problem with how UTF-8 input is handled in Logstash (5.5.0) on Windows (2008 R2/2012 R2); using the below config, any arabic input comes out as question marks ??????????, and this seems independent of the input plugin (I tried 'file' and 'beats' inputs) and codecs (JSON/PLAIN).

input {
    file {
	path => "C:\ELK\temp\input.txt"
    }
}
output {
    file {
	path => "C:\ELK\temp\output.txt"
    }
}

Using {charset => ["CP1252"]} as proposed in this discussion does fix the issue, even though the input is UTF-8.

Strangely, the above config works as expected in Linux without specifying the CP1252 'charset'!!!

Any thoughts on this are appreciated.

Thanks

EDIT: I did some testing and it seems this issue was introduced in v5.0.0, it worked as expected in v2.4.0.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.