Hello,
There seems to be a problem with how UTF-8 input is handled in Logstash (5.5.0) on Windows (2008 R2/2012 R2); using the below config, any arabic input comes out as question marks ??????????
, and this seems independent of the input plugin (I tried 'file' and 'beats' inputs) and codecs (JSON/PLAIN).
input {
file {
path => "C:\ELK\temp\input.txt"
}
}
output {
file {
path => "C:\ELK\temp\output.txt"
}
}
Using {charset => ["CP1252"]}
as proposed in this discussion does fix the issue, even though the input is UTF-8.
Strangely, the above config works as expected in Linux without specifying the CP1252 'charset'!!!
Any thoughts on this are appreciated.
Thanks
EDIT: I did some testing and it seems this issue was introduced in v5.0.0, it worked as expected in v2.4.0.