Hello,
I recently migrated our cluster from ELK2 to ELK6.
One of our logstash configs is used to parse some powershell CSV output, which is in UTF-16LE
(Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators)
In logstash 2 I had the following simple config, which worked fine:
input {
file {
path => ["/data/*.csv"]
start_position => "beginning"
codec => plain { charset => "UTF-16LE"}
type => "office365"
}
}
filter {
if [type] == "office365" {
csv {
columns => ["PSComputerName","RunspaceId","PSShowComputerName","RecordType","CreationDate","UserIds","Operations","AuditData","ResultIndex","ResultCount","Identity","IsValid","ObjectState"]
}
}
}
output {
if [type] == "office365" {
stdout { codec => rubydebug }
}
}
However, since logstash 6 the UTF-16 support seems to be broken. The CSV fails to parse and the output message I get seems to only contain what seems to be Chinese letters.
Were there any changes to the UTF-16 decoder in logstash 6?
When I convert the input file to UTF-8 first using:
iconv -f UTF-16LE -t UTF-8 input.csv > output.csv
And remove the codec UTF-8 line, the file gets parsed fine