UTF-16 Broken since logstash 6

maigel · June 12, 2018, 2:24pm

Hello,

I recently migrated our cluster from ELK2 to ELK6.
One of our logstash configs is used to parse some powershell CSV output, which is in UTF-16LE
(Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators)

In logstash 2 I had the following simple config, which worked fine:

input {
  file { 
    path => ["/data/*.csv"]
    start_position => "beginning"
    codec => plain { charset => "UTF-16LE"}
    type => "office365"
  }
}

filter {
  if [type] == "office365" {
    csv {
      columns => ["PSComputerName","RunspaceId","PSShowComputerName","RecordType","CreationDate","UserIds","Operations","AuditData","ResultIndex","ResultCount","Identity","IsValid","ObjectState"]
    }
  }
}

output {
  if [type] == "office365" {
    stdout { codec => rubydebug }
  }
}

However, since logstash 6 the UTF-16 support seems to be broken. The CSV fails to parse and the output message I get seems to only contain what seems to be Chinese letters.
Were there any changes to the UTF-16 decoder in logstash 6?

When I convert the input file to UTF-8 first using:
iconv -f UTF-16LE -t UTF-8 input.csv > output.csv
And remove the codec UTF-8 line, the file gets parsed fine

system · July 10, 2018, 2:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to read UTF-16LE encoded CSV files using file input plugin Logstash	1	17	November 19, 2024
Change character encoding in Logstash Logstash	3	1654	November 22, 2018
Input file codec json UTF-16 parse error Logstash	1	946	February 3, 2017
Logstash Invalid Character for UTF-16/Unicode encoding Logstash	7	9683	November 4, 2022
Csv columnnames utf-8 Logstash	2	724	April 6, 2018

UTF-16 Broken since logstash 6

Related topics