Logstash CSV Filter Using Unicode delimiter ( SOH )

(Samnik60) #1

HI guys,
I am trying to parse a CSV File which has a delimiter of (SOH) i.e Hexa value \x01 ( unicode value \u0001 ), but logstash csv filter seems to ignore the seperator mentioned .

separator => "\u0001"

Does logstash csv filter support unicode characters or special characters as delimiter???


(Magnus Bäck) #2

The string type documentation doesn't mention anything about this so one should assume escape sequences are unsupported. See related thread below:

(Samnik60) #3

Hi magnus,
is this a feature not present in logstash csv filter or a fundamental limitation that logstash cannot support this as it doesnt allow unicode characters in its string data type.


(Magnus Bäck) #4

Logstash strings do not support escape sequences as a way to represent non-printable characters. If you can't put a literal \u0001 in the file (which the topic I linked to indicated didn't work) you're out of luck. This has nothing to do with the csv filter.

(Samnik60) #5

I worked around this issue, i used a mutate filter as given below to replace the unicode delimiter to ascii then parse using csv

        gsub => [ "message","\u0001","," ]

(system) #6