UTF-16 Broken since logstash 6

Hello,

I recently migrated our cluster from ELK2 to ELK6.
One of our logstash configs is used to parse some powershell CSV output, which is in UTF-16LE
(Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators)

In logstash 2 I had the following simple config, which worked fine:

input {
  file { 
    path => ["/data/*.csv"]
    start_position => "beginning"
    codec => plain { charset => "UTF-16LE"}
    type => "office365"
  }
}

filter {
  if [type] == "office365" {
    csv {
      columns => ["PSComputerName","RunspaceId","PSShowComputerName","RecordType","CreationDate","UserIds","Operations","AuditData","ResultIndex","ResultCount","Identity","IsValid","ObjectState"]
    }
  }
}

output {
  if [type] == "office365" {
    stdout { codec => rubydebug }
  }
}

However, since logstash 6 the UTF-16 support seems to be broken. The CSV fails to parse and the output message I get seems to only contain what seems to be Chinese letters.
Were there any changes to the UTF-16 decoder in logstash 6?

When I convert the input file to UTF-8 first using:
iconv -f UTF-16LE -t UTF-8 input.csv > output.csv
And remove the codec UTF-8 line, the file gets parsed fine

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.