Logstash CSV Filter Using Unicode delimiter ( SOH )


(Samnik60) #1

HI guys,
I am trying to parse a CSV File which has a delimiter of (SOH) i.e Hexa value \x01 ( unicode value \u0001 ), but logstash csv filter seems to ignore the seperator mentioned .
example

filter{
csv{
separator => "\u0001"
}
}

Does logstash csv filter support unicode characters or special characters as delimiter???

Thanks,
sam


(Magnus Bäck) #2

The string type documentation doesn't mention anything about this so one should assume escape sequences are unsupported. See related thread below:


(Samnik60) #3

Hi magnus,
is this a feature not present in logstash csv filter or a fundamental limitation that logstash cannot support this as it doesnt allow unicode characters in its string data type.

Thanks,
sam


(Magnus Bäck) #4

Logstash strings do not support escape sequences as a way to represent non-printable characters. If you can't put a literal \u0001 in the file (which the topic I linked to indicated didn't work) you're out of luck. This has nothing to do with the csv filter.


(Samnik60) #5

I worked around this issue, i used a mutate filter as given below to replace the unicode delimiter to ascii then parse using csv

    mutate{
        gsub => [ "message","\u0001","," ]
    }

(system) #6